Year in Review, 2014

As 2015 dawns upon us, here’s a quick look  back at what I got up to in 2014.


I went to a few con­fer­ences and gave some talks:


I took part in the F# Advent Cal­en­dar project.

Seven inef­fec­tive cod­ing habits many F# pro­gram­mers don’t have

F# Deep Dives, a book I co-authored with Tomas, Phil, and co finally goes on sale!



I technical-reviewed a book.



I worked on some projects.


I learnt Elm.


I finally got my Myo armband!


I spent some more time with Dart.


I briefly resumed my inter­est in Project Euler and solved a few more prob­lems with F#.


And finally, here are my top 10 posts of 2014:

Top 10 most memorable talks in 2014

2014 has been a good year, and it ended with a bang! With so many great talks to choose from, here are my top 10 most mem­o­rable talks I have attended/watched this year (not in any par­tic­u­lar order since they have all taught me some­thing different).


Kevlin Hen­ney – Seven Inef­fec­tive Cod­ing Habits of Many Programmers

Kevlin Hen­ney — Seven Inef­fec­tive Cod­ing Habits of Many Pro­gram­mers from NDC Con­fer­ences on Vimeo.


James Mick­ens @ Mon­i­torama PDX 2014

Mon­i­torama PDX 2014 — James Mick­ens from Mon­i­torama on Vimeo.


Edwin Brady – Ver­i­fy­ing State­ful and Side-effecting Pro­grams using Depen­dent Types


Adam Torn­hill – Code that fits your brain

Adam Torn­hill — Code that fits your brain from NDC Con­fer­ences on Vimeo.


Andreas Ste­fik – The Pro­gram­ming Lan­guage Wars


Erik Mei­jer – Dual­ity and the End of Reactive


Bret Vic­tor – See­ing Spaces

See­ing Spaces from Bret Vic­tor on Vimeo.


Simon Ward­ley – Intro­duc­tion to Value Chain Mapping


Luke Wrob­lewski – It’s a Write/Read (Mobile) Web

Luke Wrob­lewski — It’s a Write/Read (Mobile) Web from NDC Con­fer­ences on Vimeo.


Scott Wlaschin – Func­tional Pro­gram­ming Patternsimage

Seven ineffective coding habits many F# programmers don’t have

This post is part of the F# Advent Cal­en­dar in Eng­lish 2014 project. Check out all the other great posts there!

Spe­cial thanks to Sergey Tihon for orga­niz­ing this.

A good cod­ing habit is an incred­i­bly pow­er­ful tool, it allows us to make good deci­sions with min­i­mal cog­ni­tive effort and can be the dif­fer­ence between being a good pro­gram­mer and a great one.

“I’m not a great pro­gram­mer; I’m just a good pro­gram­mer with great habits.”

- Kent Beck

A bad habit, on the other hand, can for­ever con­demn us to repeat the same mis­takes and is dif­fi­cult to correct.


I attended Kevlin Henney’s “Seven Inef­fec­tive Cod­ing Habits of Many Pro­gram­mers” talk at the recent Build­Stuff con­fer­ence. As I sat in the audi­ence and reflected on the times I exhib­ited many of the bad habits he iden­ti­fied and chal­lenged, some­thing hit me – I was cod­ing in C# in pretty much every case even though I also spend a lot of time in F#.

Am I just a bad C# devel­oper who’s bet­ter at F#, or did the lan­guage I use make a big­ger dif­fer­ence than I real­ized at the time?

With this ques­tion in mind, I revis­ited the seven inef­fec­tive cod­ing habits that Kevlin iden­ti­fied and pon­dered why they are habits that many F# pro­gram­mers don’t have.


Noisy Code

Signal-to-noise ratio some­times refers to the ratio of use­ful infor­ma­tion to false or irrel­e­vant data in a con­ver­sa­tion or exchange. Noisy code requires greater cog­ni­tive effort on the reader’s part, and is more expen­sive to main­tain.

We have a lot of habits which – with­out us real­is­ing it – add noise to our code. But often the lan­guage we use have a big impact on how much noise is added to our code, and the C-style syn­tax is big cul­prit here.

Objec­tive com­par­i­son between lan­guages are usu­ally dif­fi­cult because com­par­ing dif­fer­ent lan­guage imple­men­ta­tions across dif­fer­ent projects intro­duces too many other vari­ables. Com­par­ing dif­fer­ent lan­guage imple­men­ta­tions of a small project is achiev­able but can­not answer how well the solu­tions scale up to big­ger projects.

For­tu­nately, Simon Cousins was able to pro­vide a com­pre­hen­sive analy­sis of two code-bases writ­ten in dif­fer­ent lan­guages – C# and F# – imple­ment­ing the same application.

The appli­ca­tion was non-trivial (~350k lines of C# code) and the num­bers speak for themselves:




Not only is the F# imple­men­ta­tion shorter and gen­er­ally more use­ful (i.e. higher signal-to-noise ratio), accord­ing to Simon’s post it also took a frac­tion of the man-hours to pro­vide a more com­plete imple­men­ta­tion of the requirements:

“The C# project took five years and peaked at ~8 devs. It never fully imple­mented all of the contracts.

The F# project took less than a year and peaked at three devs (only one had prior expe­ri­ence with F#). All of the con­tracts were fully implemented.”

In sum­mary, by remov­ing the need for { } as a core part of the lan­guage struc­ture and not hav­ing nulls, F# removes a lot of the noise that are usu­ally found in C# code.


Visual Dis­hon­esty

“…a clean design is one that sup­ports visual think­ing so peo­ple can meet their infor­ma­tional needs with a min­i­mum of con­scious effort.”

- Daniel Higginbotham

When it comes to code, visual hon­esty is about lay­ing out your code so that their visual rela­tion­ships are obvi­ous and accu­rate.

For exam­ple, when you put things above each other then it sig­ni­fies hier­ar­chy. This is impor­tant, because you’re show­ing your reader how to process the infor­ma­tion you’re giv­ing them. How­ever, you may not be aware that’s what you’re doing, which is why we end up with problems.

You con­vey infor­ma­tion by the way you arrange a design’s ele­ments in rela­tion to each other. This infor­ma­tion is under­stood imme­di­ately, if not con­sciously, by the peo­ple view­ing your designs. This is great if the visual rela­tion­ships are obvi­ous and accu­rate, but if they’re not, your audi­ence is going to get con­fused. They’ll have to exam­ine your work care­fully, going back and forth between the dif­fer­ent parts to make sure they under­stand.”

- Daniel Higginbotham

Take the sim­ple mat­ter of how nested method calls are laid out in C#, they betray every­thing we have been taught about read­ing, and the order in which infor­ma­tion needs to be processed has been reversed.

image image

For­tu­nately, F# intro­duced the pipeline oper­a­tor |> which allows us to restore the visual hon­esty with the way we lay out nested func­tions calls.


In his talk Kevlin also touched on the place­ment of { } and how it affects read­abil­ity, using a rather sim­ple technique:

image  image

and by doing so, it reveals inter­est­ing prop­er­ties about the struc­ture of the above code which we might not have noticed:

  1. we can’t tell where the argu­ment list ends and method body starts
  2. we can’t tell where the if con­di­tion ends and where the if body starts

These tell us that even though we are align­ing our code, its struc­ture and hier­ar­chy is still not imme­di­ately clear with­out the aid of the curly braces. Remem­ber, if the visual rela­tion­ship between the ele­ments is not accu­rate, it’ll cause con­fu­sion for your read­ers and they need to exam­ine your code care­fully to ensure they under­stand it cor­rectly. In order words, you cre­ate addi­tional cog­ni­tive bur­den on your read­ers when the lay­out of your code does not match the pro­gram structure.

Now con­trast with the following:

image image

where the struc­ture and hier­ar­chy of the code is much more evi­dent. So it turns out that place­ment style of { } is not just a mat­ter of per­sonal pref­er­ence, it plays an impor­tant role in con­vey­ing the struc­ture of your code and there’s a right way to do it.

“It turns out that style mat­ters in pro­gram­ming for the same rea­son that it mat­ters in writing.

It makes for bet­ter reading.”

- Dou­glas Crockford

In hind­sight this seems obvi­ous, but why do we still get it wrong? How can so many of us miss some­thing so obvious?

I think part of the prob­lem is that, you have two com­pet­ing rules for struc­tur­ing your code in C-style lan­guages – one for the com­piler and one for humans. { and } are used to con­vey struc­ture of your code to the com­piler, but to con­vey struc­ture infor­ma­tion to humans, you use both { } and indentation.

This, cou­pled with the eager­ness to super­fi­cially reduce the line count or adhere by guide­lines such as “meth­ods shouldn’t be more than 60 lines long”, and we have the per­fect storm that results in us sac­ri­fic­ing read­abil­ity in the name of read­abil­ity.


So what if you use space for con­vey­ing struc­ture infor­ma­tion to both the com­piler and humans? Then you remove the ambi­gu­ity and peo­ple can stop fight­ing about where to place their curly braces!

The above exam­ple can be writ­ten in F# as the fol­low­ing, using con­ven­tions in the F# community:

image  image

notice how the code is not only much shorter, but also struc­turally very clear.


In sum­mary, F#’s pipes allows you to restore visual hon­esty with regards to the way nested func­tion calls are arranged so that the flow of infor­ma­tion matches the way we read. In addi­tion, white­spaces pro­vide a con­sis­tent way to depict hier­ar­chy infor­ma­tion to both the com­piler and human. It removes the need to argue over { } place­ment strate­gies whilst mak­ing the struc­ture of your code clear to see at the same time.


Lego nam­ing

Nam­ing is hard, and as Kevlin points out that so often we resort to lego nam­ing by glu­ing com­mon words such as ‘cre­ate’, ‘process’, ‘val­i­date’, ‘fac­tory’ together in an attempt to cre­ate meaning.

This is not nam­ing, it is labelling.

Adding more words is not the same as adding mean­ing. In fact, more often than not it can have the oppo­site effect of dilut­ing the mean­ing of the thing we’re try­ing to name. This is how we end up with gems such as con­troller­Fac­to­ry­Fac­tory, where the mean­ing of the whole is less than the sum of its parts.



Nam­ing is hard, and hav­ing to give names to every­thing — every class, method and vari­able — makes it even harder. In fact, try­ing to give every­thing a mean­ing­ful name is so hard, that even­tu­ally most of us sim­ply stop car­ing, and lego nam­ing seems like the lazy way out.


In F#, and in func­tional pro­gram­ming in gen­eral, it’s com­mon prac­tice to use anony­mous func­tions, or lamb­das. Straight away you remove the need to come up with good names for a whole bunch of things. Often the mean­ing of these lamb­das are cre­ated by the higher order func­tions that use them —, Array.filter, Array.iter, e.g. the func­tion passed into is used to, sur­prise sur­prise, map val­ues in an array!

(Before you say it, yes, you can use anony­mous del­e­gates in C# too, espe­cially when work­ing with LINQ. How­ever, when you use LINQ you are doing func­tional pro­gram­ming, and the use of lamb­das is much more com­mon in F# and other functional-first languages.)


Lego nam­ing can also be the symp­tom of a fail­ure to iden­tify the right level of abstrac­tions.

Just like nam­ing, com­ing up with the right abstrac­tions can be hard. And when the right abstrac­tion is a piece of pure func­tion­al­ity, we don’t have a way to rep­re­sent it effec­tively in OOP (note, I’m not talk­ing about the object-orientation that Alan Kay had in mind when he coined the term objects).

In sit­u­a­tions like this, the com­mon prac­tice in OOP is to wrap the func­tion­al­ity inside a class or inter­face. So you end up with some­thing that pro­vides the desired func­tion­al­ity, and the func­tion­al­ity itself. That’s two things to name instead of one, this is hard, let’s be lazy and com­bine some com­mon words together and see if they make sense…

pub­lic inter­face ConditionChecker


    bool CheckCondition();


The prob­lem here is that the right level of abstrac­tion is smaller than an “object”, so we have to intro­duce another layer of abstrac­tion just to make it fit into our world view.


In F#, and in func­tional pro­gram­ming in gen­eral, no abstrac­tion is too small and func­tions are so ubiq­ui­tous that all the OO pat­terns that we’re so fond of can be rep­re­sented as func­tions.

Take the Con­di­tionChecker exam­ple above, the essence of what we’re look­ing for is a con­di­tion that is eval­u­ated with­out input and returns a boolean value. This can be rep­re­sented as the fol­low­ing in F#:

type Con­di­tion = unit –> bool

Much more con­cise, wouldn’t you agree? And any func­tion that matches the type sig­na­ture can be treated as a Con­di­tion with­out hav­ing to explic­itly imple­ment some interface.


Another com­mon prac­tice in C# and Java is to label excep­tion types with Excep­tion, i.e. Cas­tEx­cep­tion, Argu­mentEx­cep­tion, etc. This is another symp­tom of our ten­dency to label things rather than nam­ing them, out of lazi­ness (and not the good kind).

If we had put in more thought into them, than maybe we could have come up with more mean­ing­ful names, for instance:



In F#, the com­mon prac­tice is to define errors using the light­weight excep­tion syn­tax, and the con­ven­tion here is to not use the Excep­tion suf­fix since the lead­ing excep­tion key­word already pro­vides suf­fi­cient clue as to what the type represents.



In sum­mary, whilst F# doesn’t stop you from lego nam­ing things, it helps because:

  • the use of anony­mous func­tions reduces the num­ber of things you have to name significantly;
  • being able to model your appli­ca­tion at the right level of abstrac­tion removes unnec­es­sary lay­ers of abstrac­tions, and there­fore reduce the num­ber of things you have to name even further,
  • it’s eas­ier to name things when they’re at the right level of abstraction;
  • con­ven­tion in F# is to use the light­weight excep­tion syn­tax to define excep­tion types with­out the Excep­tion suffix.



In his pre­sen­ta­tion, Kevlin showed an inter­est­ing tech­nique of using a tag cloud to see what pops out from your code:

image           image

Com­pare these two exam­ples you can see the domain of the sec­ond exam­ple sur­fac­ing through the tag cloud – paper, pic­ture, print­ingde­vice, etc. whereas the first exam­ple shows raw strings and lists.

When we under abstract, we often find our­selves with a long list of argu­ments to our methods/functions. When that list gets long enough, adding another one or two argu­ments is no longer significant.

“If you have a pro­ce­dure with ten para­me­ters, you prob­a­bly missed some.”

- Alan Perlis

Unfor­tu­nately, F# can’t stop you from under abstract­ing, but it has a pow­er­ful type sys­tem that pro­vides all the nec­es­sary tools for you to cre­ate the right abstrac­tions for your domain with min­i­mal effort. Have a look at Scott Wlaschin’s talk on DDD with the F# type sys­tem for inspi­ra­tions on how you might do that:


Unen­cap­su­lated State

If under-abstraction is like going to a job inter­view in your pyja­mas then hav­ing unen­cap­su­lated state would be akin to wear­ing your under­wear on the out­side (which, inci­den­tally put you in some rather famous circles…).


The exam­ple that Kevlin used to illus­trate this habit is dan­ger­ous because the inter­nal state that has been exposed to the out­side world is muta­ble. So not only is your under­wear worn on the out­side for all to see, every­one is able to mod­ify it with­out your con­sent… now that’s a scary thought!

In this exam­ple, what should have been done is for us to encap­su­late the muta­ble list and expose only the prop­er­ties that are rel­e­vant, for instance:

type Recent­lyUs­edList () =

    let items = new List<string>()

    mem­ber this.Items = items.ToArray() // now the out­side world can’t mutate our inter­nal state

    mem­ber this.Count = items.Count


Whilst F# has no way to stop you from expos­ing the items list pub­li­cally, func­tional pro­gram­mers are very con­scious of main­tain­ing the immutabil­ity facade so even if a F#’er is using a muta­ble list inter­nally he would not have allowed it to leak outside.

In fact, a F#’er would prob­a­bly have imple­mented a Recent­lyUs­edList dif­fer­ently, for instance:

type Recent­lyUs­edList (?items) =

    let items = default­Arg items [ ]

    mem­ber this.Items = List.toArray items

    mem­ber this.Count = List.length items

    mem­ber this.Add newItem =

        let newItems = newItem::(items |> List.filter ((<>) newItem))

        Recent­lyUs­edList newItems


But there’s more.

Kevlin also touched on encap­su­la­tion in gen­eral, and its rela­tion to a usabil­ity study con­cept called affor­dance.

“An affor­dance is a qual­ity of an object, or an envi­ron­ment, which allows an indi­vid­ual to per­form an action. For exam­ple, a knob affords twist­ing, and per­haps push­ing, whilst a cord affords pulling.”

- Wikipedia

If you want the user to push, then don’t give them some­thing that they can pull, that’d be bad usabil­ity design. The same prin­ci­ples applies to code, your abstrac­tions should afford the right behav­iours whilst make it impos­si­ble to do the wrong thing.


When mod­el­ling your domain with F#, since there are no nulls it imme­di­ately elim­i­nates the most com­mon ille­gal state that you have to look out for. And since types are immutable by default, once they are val­i­dated at con­struc­tion time you don’t have to worry about them enter­ing into an invalid state later.

To make invalid states un-representable in your model, a com­mon prac­tice is to cre­ate a finite, closed set of pos­si­ble valid states as a dis­crim­i­nated union. As a sim­ple example:

type Pay­ment­Method =

    | Cash

    | Cheque of ChequeNumber

    | Card     of Card­Type * CardNumber

Com­pared to a class hier­ar­chy, a dis­crim­i­nated union type can­not be extended and there­fore invalid states can­not be intro­duced at a later date by abusing/exploiting inheritance.


In sum­mary, F# pro­gram­mers are very con­scious of immutabil­ity so even if they are using muta­ble types to rep­re­sent inter­nal state it’s highly unlikely for them to expose their muta­bil­ity and break the immutabil­ity facade they hold so dearly.

And because types are immutable by default, and there are no nulls in F#, it’s also easy for you to ensure that invalid states are sim­ply un-representable when mod­el­ling a domain.


Get­ters and Setters

Kevlin chal­lenged the nam­ing of get­ters and set­ters, since in Eng­lish ‘get’ usu­ally implies side effects:

“I get married”

“I get money from the ATM

“I get from point A to point B”

Yet, in pro­gram­ming, get implies a query with no side effects.

Also, get­ters and set­ters are oppo­sites in pro­gram­ming, but in Eng­lish, the oppo­site of set is reset or unset.


Sec­ondly, Kevlin chal­lenged the habit of always cre­at­ing set­ters when­ever we cre­ate get­ters. This habit is even enforced and encour­aged by many mod­ern IDEs that gives you short­cuts to auto­mat­i­cally cre­ate these get­ters and set­ters in pairs.

“That’s great, now we have short­cuts to do the wrong thing.

We used to have type lots to do the wrong thing, not anymore.”

- Kevlin Hen­ney

And he talked about how we need to be more cau­tious and con­scious about what can change and what can­not.

“When it is not nec­es­sary to change, it is nec­es­sary not to change.”

- Lucius Cary


With F#, immutabil­ity is the default, so when you define a new record or value, it is immutable unless you explic­itly say so oth­er­wise (with the muta­ble key­word). So to do the wrong thing, i.e. to define a cor­re­spond­ing set­ter for every get­ter, you have to do lots of extra work.

Every time you have to type the muta­ble key­word is another chance for you to ask your­self “is it really nec­es­sary for this field to change”. In my expe­ri­ence it has pro­vided suf­fi­cient fric­tion and forced me to make very con­scious deci­sions on what can change under what conditions.


Unco­he­sive Tests

Many of us have the habit of test­ing meth­ods – that is, for every method Foo we have a Test­Foo that invokes Foo and inspects its behav­iour. This type of test­ing cov­ers only the sur­face area of your code, and although you can achieve a high code cov­er­age per­cent­age this way (and keep the man­agers happy), that cov­er­age num­ber is only superficial.

Meth­ods are usu­ally called in dif­fer­ent com­bi­na­tions to achieve some desired func­tion­al­ity, and many of the com­plex­i­ties and poten­tial bugs lie in the way they work together. This is par­tic­u­larly true when states are con­cerned as the order the state is updated in might be sig­nif­i­cant and you also bring con­cur­rency into the equation.

Kevlin calls for an end of this prac­tice and for us to focus on test­ing spe­cific func­tion­al­i­ties instead, and use our tests as spec­i­fi­ca­tions for those functionalities.

“For tests to drive devel­op­ment they must do more than just test that code per­forms its required func­tion­al­ity: they must clearly express that required func­tion­al­ity to the reader. That is, they must be clear spec­i­fi­ca­tion of the required functionality.”

- Nat Pryce and Steve Freeman

This is in line with Gojko Adzic’s Spec­i­fi­ca­tion by Exam­ple which advo­cates the use of tests as a form of spec­i­fi­ca­tion for your appli­ca­tion that is exe­cutable and always up-to-date.


But, even as we improve on what we test, we still need to have suf­fi­cient num­ber of tests to give us a rea­son­able degree of con­fi­dence. To put it into con­text, an exhaus­tive test suit for a func­tion of the sig­na­ture Int –> Int would need to have 2147483647 test cases. Of course, you don’t need an exhaus­tive test suit to reach a rea­son­able degree of con­fi­dence, but there’s a lim­i­ta­tion on the num­ber of tests that we will be able to write by hand because:

  • writ­ing and main­tain­ing large num­ber of tests are expensive
  • we might not think of all the edge cases

This is where property-based auto­mated test­ing comes in, and that’s where the F# (and other QuickCheck–enabled lan­guages such as Haskell and Erlang) com­mu­nity is at with the wide­spread adop­tion of FsCheck. If you’re new to FsCheck or property-based test­ing in gen­eral, check out Scott Wlaschin’s detailed intro­duc­tory post to property-based test­ing as part of the F# Advent Cal­en­dar in Eng­lish 2014.



We pick up habits – good and bad – over time and with prac­tice. Since we know that prac­tice doesn’t make per­fect, it makes per­ma­nent; only per­fect prac­tice makes per­fect, it is impor­tant for us to acquire good prac­tice in order to form and nur­ture good habits.

The pro­gram­ming lan­guage we use day-to-day plays an impor­tant role in this regard.

“Pro­gram­ming lan­guages have a devi­ous influ­ence: they shape our think­ing habits.”

- Dijk­stra

As another year draws to a close, let’s hope the year ahead is filled with the good prac­tice we need to make per­fect, and to ensure it let’s all write more F# :-P


Wish you all a merry xmas!



F# Advent Cal­en­dar in Eng­lish 2014

Kevlin Hen­ney – Seven inef­fec­tive cod­ing habits of many programmers

Andreas Selfik – The pro­gram­ming lan­guage wars

Simon Cousins – Does the lan­guage you use make a dif­fer­ence (revisited)

Cod­ing Hor­ror – The best code is no code at all

F# for Fun and Profit — Cycles and mod­u­lar­ity in the wild

Being visu­ally hon­est with F#

Ian Bar­ber – Nam­ing things

Joshua Bloch – How to design a good API and why it matters

Null Ref­er­ences : the Bil­lion dol­lar mistake

Neo4j talk at CodeMesh 2014

A look at Microsoft Orleans through Erlang-tinted glasses

Some time ago, Microsoft announced Orleans, an imple­men­ta­tion of the actor model in .Net which is designed for the cloud envi­ron­ment where instances are ephemeral.

We’re cur­rently work­ing on a num­ber of projects in Erlang and have run into some assump­tions in dis­trib­uted Erlang which doesn’t hold true in a cloud-hosted envi­ron­ment where nodes are ephemeral and entire topolo­gies are con­stantly in flux. Also, as most of our back­end code for Gamesys Social is in .Net, being able to work with lan­guages that we’re already famil­iar with is a win for the team (more peo­ple being able to work on the code­base, for instance).

As such I have been tak­ing an inter­est in Orleans to see if it rep­re­sents a good fit, and whether or not it holds up to some of its lofty promises around scal­a­bil­ity, per­for­mance and reli­a­bil­ity. Below is an account of my per­sonal views hav­ing read the paper, down­loaded the SDK and looked through the sam­ples and fol­lowed through Richard Astbury’s Plu­ral­sight course.



Update 2014/12/08:

Since I posted this the other day, there has been some great feed­back from the Orleans team and clar­i­fied sev­eral places where I have pre­vi­ously mis­un­der­stood based on the infor­ma­tion I had at the time of writ­ing. Some of my other con­cerns still remain, but at least two of the biggest stick­ing points – sin­gle point of fail­ure and at-least-once mes­sage deliv­ery – has been disproved.

As such, I’ll updated this post in sev­eral place to incor­po­rate the new infor­ma­tion that the Orleans team have pro­vided via the comments.

I’ve left what was pre­vi­ously writ­ten untouched, but look out for the impacted sec­tions (* fol­lowed by a para­graph that is under­lined) through­out the post to see the rel­e­vant new infor­ma­tion. In this call­out sec­tions I have focused on the cor­rect behav­iour that you should expected based on cor­rec­tions from Sergey and Gabriel, if you’re inter­ested in the back­ground and ratio­nale behind these deci­sions, please read their com­ments in full.





When I first read about Orleans, I was hes­i­tant because of the use of code-gen (rem­i­nis­cent of WCF there), and that the under­ly­ing mes­sage pass­ing mech­a­nism is hid­den from you so you end up with a RPC mech­a­nism (again, rem­i­nis­cent of WCF…).

How­ever, after spend­ing some time with Orleans, I can def­i­nitely see its appeal – con­ve­nience and pro­duc­tiv­ity. I was able to get some­thing up and run­ning quickly and with ease. My orig­i­nal con­cerns about code-gen and RPC didn’t get in the way of me get­ting stuff done.

As I dig deeper into how Orleans works though, a num­ber of more wor­ry­ing con­cerns sur­faced regard­ing some of its core design decisions.

For starters, *1 it’s not par­ti­tion tol­er­ant towards par­ti­tions to the data store used for its Silo man­age­ment. Should the data store be par­ti­tioned or suf­fer an out­age, it’ll result in a full out­age of your ser­vice. These are not traits of a mas­ter­less and par­ti­tion tol­er­ant sys­tem that is desir­able when you have strict uptime requirements.

When every­thing is work­ing, Orleans guar­an­tees that there is only one instance of a vir­tual actor run­ning in the sys­tem, but when a node is lost the cluster’s knowl­edge of nodes will diverge and dur­ing this time the single-activation guar­an­tees becomes even­tu­ally con­sis­tent. How­ever, you can pro­vide stronger guar­an­tees your­self (see Silo Man­age­ment sec­tion below).

*2 Orleans uses at-least-once mes­sage deliv­ery, which means it’s pos­si­ble for the same mes­sage to be sent twice when the receiv­ing node is under load or sim­ply fails to acknowl­edge the first mes­sage in a timely fash­ion. This again, is some­thing that you can mit­i­gate your­self (see Mes­sage Deliv­ery Guar­an­tees sec­tion below).

Finally, its task sched­ul­ing mech­a­nism appears to be iden­ti­cal to that of a naive event loop and exhibits all the fal­lac­ies of an event loop (see Task Sched­ul­ing sec­tion below).



*1 As Gabriel and Sergey both explained in the com­ments, the mem­ber­ship man­age­ment works quite a bit dif­fer­ent to what I first thought. Once con­nected, all heart­beats are sent between pairs of nodes using a lin­ear algo­rithm, and the back­end data store is only used for reach­ing agree­ments on what nodes are dead and to dis­sem­i­nate the new mem­ber­ship view to all other nodes.

In this case, los­ing con­nec­tion to the back­end data store would not impact exist­ing, con­nected clus­ters. mak­ing it par­ti­tion tol­er­ant. If the back­end data store becomes unavail­able and at the same time as your clus­ter topol­ogy is chang­ing then it will hin­der updates to the mem­ber­ship and stop new nodes from being able to join the cluster.

Hope­fully the imple­men­ta­tion details of the mem­ber­ship man­age­ment will be dis­cussed in more detail in Orleans team’s future posts. Also, since Orleans will be open sourced in early 2015, we’ll be able to get a closer look at exactly how this behaves when the source code is available.


*2 Gabriel pointed out that by default Orleans does not resend mes­sages that have timed out. So by default, it uses at-most-once deliv­ery, but can be con­fig­ured to auto­mat­i­cally resend upon time­out if you want at-least-once deliv­ery instead.




In Orleans, a Grain rep­re­sents an actor, and each node has a Silo which man­ages the life­time of the grains run­ning inside the Silo. A Grain is acti­vated when it receives a request, and can be deac­ti­vated after it becomes idle for a while. The Silo will also remove the deac­ti­vate Grains from mem­ory to free up resources.

Orleans’ Grains are referred to as vir­tual actors. They are vir­tual because the state of a Grain can be per­sisted to a stor­age ser­vice, and then rein­stated as the Grain is reac­ti­vated (after being deac­ti­vated due to idle­ness) on another node. This is a nice abstrac­tion from a developer’s point of view, and to enable this level of indi­rec­tion, Orleans intro­duces the mech­a­nism of state providers.


Stor­age Providers

To use stor­age providers, you first need to define an inter­face that rep­re­sents the state of your Grain, and have it inherit from Orleans’ IGrain­State inter­face. For example:


Then in your Grain imple­men­ta­tion class, you pro­vide this ITour­na­ment­Grain­State inter­face as generic type para­me­ter to the Orleans.Grains base class, as below. You also need to spec­ify the stor­age provider you want to use via the [Stor­age­Provider] attribute. The Provider­Name spec­i­fied here points to a stor­age provider you define in the con­fig­u­ra­tion file for Orleans.


When a Grain is acti­vated in a Silo, the Orleans run­time will go and fetch the state of the Grain for us and put it in an instance mem­ber called State. For instance, when the Acti­vateA­sync method is called, the state of our Tour­na­ment­Grain would have been pop­u­lated from the DynamoDB­Stor­age provider we created:


You can mod­ify the state by mod­i­fy­ing its mem­bers, but the changes will not be per­sisted to the back­end stor­age ser­vice until you call the State.WriteStateAsync method. Un-persisted changes will be lost when the Grain is deac­ti­vated by the Silo, or if the node itself is lost.


Finally, there are a num­ber of built-in stor­age providers, such as the Azure Table Stor­age, but it’s triv­ial to imple­ment your own. To imple­ment a cus­tom stor­age provider, you just need to imple­ment the IStor­age­Provider interface.


Stor­age providers make it very easy to cre­ate actors that can be eas­ily resumed after deac­ti­va­tion, but you need to be mind­ful of a num­ber of things:

  • how often you per­sist state is a trade-off between dura­bil­ity and per­for­mance + cost
  • if mul­ti­ple parts of the state need to be mod­i­fied in one call, you need to have a roll­back strat­egy in place in case of excep­tions or risk leav­ing dirty writes in your state (see Not let­ting it crash sec­tion below)
  • you need to han­dle the case when per­sis­tence fails – since you’ve mutated the in-memory state, if per­sis­tence failed do you roll­back or con­tinue and hope that you get another chance at per­sist­ing the state before the actor is deac­ti­vated through idle­ness or the node crashing


In Erlang, there are no built-in mech­a­nism for stor­age providers, but there is also noth­ing stop­ping you from imple­ment­ing this your­self. Have a look at Bryan Hunter’s CQRS with Erlang talk at NDC Oslo 2014 for inspiration.


Silo Mem­ber­ship

Silos use a back­end store to man­age Silo mem­ber­ships, this uses Azure Table Stor­age by default. From the MSR paper, this is what it has to say about Silo memberships:

“Servers auto­mat­i­cally detect fail­ures via peri­odic heart­beats and reach an agree­ment on the mem­ber­ship view. For a short period of time after a fail­ure, mem­ber­ship views on dif­fer­ent servers may diverge, but it is guar­an­teed that even­tu­ally all servers will learn about the failed server and have iden­ti­cal mem­ber­ship views….if a server was declared dead by the mem­ber­ship ser­vice, it will shut itself down even if the fail­ure was just a tem­po­rary net­work issue.

Fur­ther­more, on the guar­an­tee that an actor (or Grain with a spe­cific ID) is only acti­vated on one node:

“In failure-free times, Orleans guar­an­tees that an actor only has a sin­gle acti­va­tion. How­ever, when fail­ures occur, this is only guar­an­teed eventually.

Mem­ber­ship is in flux after a server has failed but before its fail­ure has been com­mu­ni­cated to all sur­vivors. Dur­ing this period, a register-activation request may be mis­routed if the sender has a stale mem­ber­ship view….However, it may be that two acti­va­tions of the same actor are reg­is­tered in two dif­fer­ent direc­tory par­ti­tions, result­ing in two acti­va­tions of a single-activation actor. In this case, once the mem­ber­ship has set­tled, one of the acti­va­tions is dropped from the direc­tory and a mes­sage is sent to its server to deac­ti­vate it.”

There are cou­ple of things to note about Silo mem­ber­ship man­age­ment from the above:

  • *3 the way servers det­o­nate when they lose con­nec­tiv­ity to the stor­age ser­vice means it’s not partition-tolerant because if the stor­age ser­vice is par­ti­tioned from the clus­ter, even for a rel­a­tively short amount of time, then there’s a chance for every node that are run­ning Silos to self detonate;
  • *3 there is a sin­gle point of fail­ure at the stor­age ser­vice used to track Silo mem­ber­ships, any out­age to the stor­age ser­vice results in out­age to your Orleans ser­vice too (this hap­pened to Halo 4);
  • it offers strong con­sis­tency dur­ing the good times, but fails back to even­tual con­sis­tency dur­ing fail­ures;
  • whilst it’s not men­tioned, but I spec­u­late that depend­ing on the size of clus­ter and the time to con­verge on Silo mem­ber­ship views across the clus­ter, it’s pos­si­ble to have more than two acti­va­tions of the same Grain in the cluster;
  • the con­flict res­o­lu­tion approach above sug­gests that one acti­va­tion is cho­sen at ran­dom and the rest are dis­carded, this seems rather naive and means los­ing inter­me­di­ate changes recorded on the dis­card Grain acti­va­tions, and
  • since each acti­va­tion can be per­sist­ing its state inde­pen­dently so it’s pos­si­ble for the sur­viv­ing Grain activation’s inter­nal state to be out-of-sync with what had been persisted;
  • these fail­ure times can hap­pen a lot more often than you think, nodes can be lost due to fail­ure, but also as a result of planned/automatic scal­ing down events through­out the day as traf­fic pat­terns change (Orleans is designed for the cloud and all its elas­tic scal­a­bil­ity after all).


Dur­ing fail­ures, it should be pos­si­ble to pro­vide stronger guar­an­tee on sin­gle acti­va­tion using opti­mistic con­cur­rency around the Grain’s state. For instance,

1. node A failed, now the cluster’s view of Silos have diverged

2a. Grain receives a request on node B, and is acti­vated with state v1

2b. Grain receives a request on node C, and is acti­vated with state v1

3a. Grain on node B fin­ishes pro­cess­ing request, and suc­ceeds in sav­ing state v2

3b. Grain on node C fin­ishes pro­cess­ing request, but fails to save state v2 (opti­mistic con­cur­rency at work here)

4. Grain on node C fails the request and trig­gers deactivation

5. Clus­ter only has one acti­va­tion of the Grain on node B

Enforc­ing stronger sin­gle acti­va­tion guar­an­tee in this fash­ion should also remove the need for bet­ter con­flict res­o­lu­tion. For this approach to work, you will need to be able to detect per­sis­tence fail­ures due to opti­mistic con­cur­rency. In DynamoDB, this can be iden­ti­fied by a con­di­tional check error.



*3 Again, as per Gabriel and Sergey’s com­ments below, this is not true and there’s no sin­gle point of fail­ure in this case. See *1 above, or read the com­ments for more detail.



Dis­trib­uted Erlang employs a dif­fer­ent approach for form­ing clus­ters. Nodes form a mesh net­work where every node is con­nected to every other node. They use a gos­sip pro­to­col to inform each other when nodes join or leave the cluster.


This approach has a scal­a­bil­ity lim­i­ta­tion and doesn’t scale well to thou­sands, or even hun­dreds of nodes depend­ing on the capa­bil­i­ties of each node. This is due to the over­head in form­ing and main­tain­ing the clus­ter increases qua­drat­i­cally to the num­ber of nodes. The effect is par­tic­u­larly evi­dent if you require fre­quent inter-node com­mu­ni­ca­tions, or need to use func­tions in the global built-in module.

In this par­tic­u­lar space, SD (scal­able dis­trib­uted) Erlang is attempt­ing to address this short­com­ing by allow­ing you to cre­ate groups of sub-clusters amongst your nodes so that the size of the mesh net­work is lim­ited to the size of the sub-clusters.


Ran­dom Actor Placement

Another inter­est­ing choice Orleans made is that, instead of using con­sis­tent hash­ing for actor place­ment (a tech­nique that has been suc­cess­fully used in a num­ber of Key-Value Stores such as Couch­Base and Riak), Orleans intro­duces another layer of indi­rec­tion here by intro­duc­ing the Orleans direc­tory.

“The Orleans direc­tory is imple­mented as a one-hop dis­trib­uted hash table (DHT). Each server in the clus­ter holds a par­ti­tion of the direc­tory, and actors are assigned to the par­ti­tions using con­sis­tent hash­ing. Each record in the direc­tory maps an actor id to the loca­tion( s) of its activations.”


The ratio­nale for this deci­sion is that it allows *4 ran­dom place­ment of actors will help avoid cre­ation of hotspots in your clus­ter which might result from poorly cho­sen IDs, or bad luck. But it means that to retain cor­rect­ness, every request to actors now require an addi­tional hop to the direc­tory par­ti­tion first. To address this per­for­mance con­cern, each node will use a local cache to store where each actor is.

I think this is a well-meaning attempt to a prob­lem, but I’m not con­vinced that it’s a prob­lem that deserves the

  1. addi­tional layer of indi­rec­tion and
  2. the sub­se­quent prob­lem of per­for­mance, and
  3. the sub­se­quent use of local cache and
  4. *5 the prob­lem of cache inval­i­da­tion that comes with it (which as we know, is one of the two hard prob­lems in CS)

Is it really worth it? Espe­cially when the IDs are guids by default, which hash well. Would it not be bet­ter to solve it with a bet­ter hash­ing algorithm?

From my per­sonal expe­ri­ence of work­ing with a num­ber of dif­fer­ent Key-Value stores, actor place­ment has never been an issue that is sig­nif­i­cant enough to deserve the spe­cial treat­ment Orleans have given it. I’d really like to see results of any empir­i­cal study that shows this to be a big enough issue in real-world key-value store usages.



*4 As Sergey men­tioned in the com­ments, you can do a few more things such as using a Prefer­LocalPlace­ment strat­egy to “instruct the run­time to place an acti­va­tion of a grain of that type local to the first caller (another grain) to it.That is how the app can hint about opti­miz­ing place­ment.”. This appears the same as when you spawn a new process in Erlang. It would require fur­ther clar­i­fi­ca­tion from Sergey or Gabriel but I’d imag­ine the place­ment strat­egy prob­a­bly applies at the type level for each type of grain.

The addi­tional layer of abstrac­tion does buy you some more flex­i­bil­ity, and not hav­ing to move grains around when topol­ogy changes prob­a­bly sim­pli­fies things (I imag­ine mov­ing the direc­tory infor­ma­tion around is cheaper and eas­ier than the grains them­selves) from the imple­men­ta­tion point-of-view too.

In the Erlang space, Riak­Core pro­vides a set of tools to help you build dis­trib­uted sys­tems and its approach gives you more con­trol over the behav­iour of your sys­tem. You do how­ever have to imple­ment a few more things your­self, such as how to move data around when clus­ter topol­ogy changes (though the vnode behav­iour gives you the basic tem­plate for doing this) and how to deal with col­li­sions etc.


*5 Hit­ting stale cache is not much of a prob­lem in this case, where Orleans would do a new lookup, for­ward the mes­sage to the right des­ti­na­tion and update the cache.



As a side, here’s how Riak does con­sis­tent hash­ing and read/write replications:

Riak NRW


Mes­sage Deliv­ery Guarantees

“Orleans pro­vides at-least-once mes­sage deliv­ery, by resend­ing mes­sages that were not acknowl­edged after a con­fig­urable time­out. Exactly-once seman­tics could be added by per­sist­ing the iden­ti­fiers of deliv­ered mes­sages, but we felt that the cost would be pro­hib­i­tive and most appli­ca­tions do not need it. This can still be imple­mented at the appli­ca­tion level”

The deci­sion to use at-least-once mes­sage deliv­ery as default is a con­tentious one in my view. *6 A slow node will cause mes­sages to be sent twice, and han­dled twice, which is prob­a­bly not what you want most of the time.

Whilst the ratio­nale regard­ing cost is valid, it seems to me that allow­ing the mes­sage deliv­ery to time out and let­ting the caller han­dle time­out cases is the more sen­si­ble choice here. It’d make the han­dling of time­outs an explicit deci­sion on the appli­ca­tion developer’s part, prob­a­bly at a per call basis since some calls are more impor­tant than others.



*6 The default behav­iour is to not resent on time­out, so by default Orleans actu­ally uses at-most-once deliv­ery. But you can con­fig­ure it to auto­mat­i­cally resend upon time­out, i.e. at-least-once.



Not let­ting it crash

Erlang’s mantra has always been to Let It Crash (except when you shouldn’t). When a process crashes, it can be restarted by its super­vi­sor. Using tech­niques such as event sourc­ing it’s easy to return to the pre­vi­ous state just before the crash.

When you except in an Orleans’ Grain, the excep­tion does not crash the Grain itself and is sim­ply reported back to the caller instead. This sim­pli­fies things but runs the risk of leav­ing behind dirty writes (hence cor­rupt­ing the state) in the wake of an excep­tion. For example:


The choice of not crash­ing the grain in the event of excep­tions offers con­ve­nience at the cost of break­ing the atom­ic­ity of oper­a­tions and per­son­ally it’s not a choice that I agree with.


Reen­trant Grains

In a dis­cus­sion on the Actor model with Erik Mei­jer and Clemens Szyper­ski, Carl Hewitt (father of the Actor model) said

“Con­cep­tu­ally, mes­sages are processed one at a time, but the imple­men­ta­tion can allow for con­cur­rent pro­cess­ing of messages”

In Orleans, grains process mes­sages one at a time nor­mally. To allow con­cur­rent pro­cess­ing of mes­sages, you can mark your grain imple­men­ta­tion with the [Reen­trant] attribute.

Reen­trant grains can be used as an opti­miza­tion tech­nique to remove bot­tle­necks in your net­work of grains.

“One actor is no actor, they come in sys­tems, and they have to have addresses so that one actor can send mes­sages to another actor.”

– Carl Hewitt

How­ever, using reen­trant grains means you lose the guar­an­tee that the state is accessed sequen­tially and opens your­self up to poten­tial race-conditions. You should use reen­trant grains with great care and consideration.


In Erlang, con­cur­rent pro­cess­ing of mes­sages is not allowed. But, you don’t have to block your actor whilst it waits for some expen­sive com­pu­ta­tion to com­plete. Instead, you can spawn another actor and ask the child actor to carry on with the expen­sive work whilst the par­ent actor processes the next mes­sage. This is pos­si­ble because the over­head and cost of spawn­ing a process in Erlang is very low and the run­time can eas­ily han­dle ten of thou­sands of con­cur­rent processes and load bal­ance across the avail­able CPU resources via its schedulers.

If nec­es­sary, once the child actor has fin­ished its work it can send a mes­sage back to the par­ent actor, whom can then per­form any sub­se­quent com­pu­ta­tion as required.

Using this sim­ple tech­nique, you acquire the same capa­bil­ity that reen­trant grains offers. You can also con­trol which mes­sages can be processed con­cur­rently, rather than the all-or-nothing approach that reen­trant grains uses.


Immutable Mes­sages

*7 Mes­sages sent between Grains are usu­ally seri­al­ized and then dese­ri­al­ized, this is an expen­sive over­head when both Grains are run­ning on the same machine. You can option­ally mark the mes­sages are immutable so that they won’t be seri­al­ized when passed between Grains on the same machine, but this immutabil­ity promise is not enforced at all and it’s entirely down to you to apply the due dili­gence of not mutat­ing the messages.



*7 A clar­i­fi­ca­tion here, mes­sages are only seri­al­ized and dese­ri­al­ized when they are sent across nodes, mes­sages sent between grains on the same node is deep-copied instead, which is cheaper than seri­al­iza­tion. Mark­ing the type as immutable skips the deep-copying process too.

But, it’s still your respon­si­bil­ity to enforce the immutabil­ity guarantee.



In Erlang, vari­ables are immutable so there is no need to do any­thing explicit.


Task Sched­ul­ing

“Orleans sched­ules appli­ca­tion turns using coop­er­a­tive mul­ti­task­ing. That means that once started, an appli­ca­tion turn runs to com­ple­tion, with­out inter­rup­tion. The Orleans sched­uler uses a small num­ber of com­pute
threads that it con­trols, usu­ally equal to the num­ber of CPU cores, to exe­cute all appli­ca­tion actor code.”

This is another point of con­cern for me.

Here we’re exhibit­ing the same vul­ner­a­bil­i­ties of an event-loop sys­tem (e.g. Node.js, Tor­nado) where a sin­gle poi­soned mes­sage can crip­ple your entire sys­tem. Even with­out poi­soned mes­sages, you are still left with the prob­lem of not dis­trib­ut­ing CPU resources evenly across actors, and allow­ing slow/misbehaving actors to badly impact your latency for other pend­ing requests.

Even hav­ing mul­ti­ple cores and hav­ing one thread per core (which is a sane choice) is not going to save you here. All you need is one slow-running actor on each processor-affined exe­cu­tion thread to halt the entire system.


The Erlang’s approach to sched­ul­ing makes much more sense – one sched­uler per core, and an actor is allowed to exe­cute 2000 reduc­tions (think of one reduc­tion as one func­tion call to do some­thing) before it has to yield the CPU so that another actor can get a slice of the CPU time. The orig­i­nal actor will then wait for its turn to run again.

This CPU-sharing pol­icy is no dif­fer­ent to what the OS does with threads, and there’s a good rea­son for that.


Ease of use

I think this is the big win­ner for Orleans and the focus of its design goals.

I have to admit, I was pleas­antly sur­prised how eas­ily and quickly I was able to put a ser­vice together and have it run­ning locally. Based on what I have seen of the sam­ples and Richard’s Plu­ral­sight course, deploy­ing to the cloud is pretty straight for­ward too.


Cloud Friend­li­ness

Another win for Orleans here, as it’s designed from the start to deal with clus­ter topolo­gies that can change dynam­i­cally with ephemeral instances. Whereas dis­trib­uted Erlang, at least dis­trib­uted OTP, assumes a fixed topol­ogy where nodes have well defined roles at start. There are also chal­lenges around get­ting the built-in dis­trib­uted data­base – Mne­sia – to work well in a dynam­i­cally chang­ing topology.



In many ways, I think Orleans is true to its orig­i­nal design goals of opti­miz­ing for devel­oper pro­duc­tiv­ity. But by shield­ing devel­op­ers from deci­sions and con­sid­er­a­tions that usu­ally comes with build­ing dis­trib­uted sys­tems, it has also deprived them of the oppor­tu­nity to build sys­tems that need to be resilient to fail­ures and meet strin­gent uptime requirements.

But, not every dis­trib­uted sys­tem is crit­i­cal, and not every dis­trib­uted sys­tem needs five to nine nines uptime. As long as you’re informed about the trade-offs that Orleans have made and what they mean to you as a devel­oper, you can at least make informed choices of if and when to adopt Orleans.

I hope this post will help you make those informed deci­sions, if I have been mis­in­formed and incor­rect about any parts of Orleans work­ing, please do leave a com­ment below.