Bit Syn­tax in Erlang

One of the often under-appreciated fea­tures of Erlang is its Bit Syn­tax for pars­ing and pat­tern match­ing binary data at a bit level. For instance, to pare TCP seg­ments you can write some­thing along the line of:



The same capa­bil­ity can be applied to any­thing binary pro­to­cols, such as video encod­ing, imag­ing, or UDP.


Imi­tat­ing with F# Com­pu­ta­tion Expressions



Whilst this capa­bil­ity is not built into F#, or any other lan­guage that I know of for that mat­ter, we do have a very pow­er­ful, and robust fea­ture in F# in Com­pu­ta­tion Expres­sions.

With com­pu­ta­tion expres­sions, I was able to cre­ate a small library that allows you to write and read data to and from a stream at a bit level. With the bitWriter and bitReader work­flows you will be able to write and parse TCP head­ers with code like the following:


The library is avail­able via Nuget:

please give it a try, and let me know if you find any bugs here.


p.s. there is still much work to be done on the library. For instance, it’s allo­cat­ing a new buffer array for each work­flow rather than using a buffer pool. If you find this library use­ful and in need for greater per­for­mance, please feel free to con­tribute and help improve the per­for­mance of this library.


It’s been a lit­tle while since I last spent time with Elm, and since Elm 0.13 was recently announced so what bet­ter time to get my Elm hat back on and see what’s new.

There’s a new-look online debug­ger which looks pret­tier than before:


but more impor­tant than that, is the new Elm Reac­tor com­mand line tool which pow­ers both the online debug­ger. With the Elm Reac­tor you can run your own time-travelling debug­ger locally and as you edit your Elm source files you can watch your appli­ca­tion update in real-time while retain­ing the abil­ity to go back in time by play back pre­vi­ous events.

There’s also a new com­mand line pack­age man­ager – Elm Get – which gives you the abil­ity to eas­ily add com­mu­nity Elm libraries to your project, or to pub­lish your own libraries to the Elm Pub­lic Library. Over­all it works very sim­i­lar to the how Dart-pub works and whilst I haven’t pub­lished any libraries myself it seems a straight for­ward affair.

There are a cou­ple of small break­ing changes in the core lan­guage, and it’s great to see that F#’s func­tional com­po­si­tion oper­a­tors (« and ») have been adopted and released in this version!


Now that I’ve caught up on the changes, I put together a sim­ple imple­men­ta­tion of Snake, and to my pleas­ant sur­prise the whole thing came in at less than 100 LOC although admit­tedly not the eas­i­est 100 LOC I’ve ever writ­ten. I had to really think about what I’m doing (which is a good thing), the lack of IDE sup­port occa­sion­ally gets in the way, and I find the error mes­sage hard to read some­times (although it’s much bet­ter for­mat­ted when you work against Elm Reac­tor run­ning locally.

If you’ve got a few min­utes to kill, why not give it a go:


and feel free to check out the source code on Github.



Elm Reac­tor – Time Travel made Easy

Elm 0.13 – Archi­tec­ture Improvements

Elm startup project

Elm-Snake project page


On appli­ca­tion monitoring

In the Gamesys social team, our view on appli­ca­tion mon­i­tor­ing is such that any­thing that runs in pro­duc­tion needs to be mon­i­tored exten­sively all the time – every ser­vice entry point, IO oper­a­tions or CPU inten­sive tasks. Sure, it comes at the cost of a few CPU cycles which might mean that you have to run a few more servers at scale, but that’s small cost to pay com­pared to:

  • lack of vis­i­bil­ity of how your appli­ca­tion is per­form­ing in pro­duc­tion; or
  • inabil­ity to spot issues occur­ring on indi­vid­ual nodes amongst large num­ber of servers; or
  • longer time to dis­cov­ery on pro­duc­tion issues, which results in
    • longer time to recov­ery (i.e. longer downtime)
    • loss of cus­tomers (imme­di­ate result of downtime)
    • loss of cus­tomer con­fi­dence (longer term impact)

Ser­vices such as Stack­Driver and Ama­zon Cloud­Watch also allow you to set up alarms around your met­rics so that you can be noti­fied or some auto­mated actions can be trig­gered in response to chang­ing con­di­tions in production.

In Michael T. Nygard’s Release It!: Design and Deploy Production-Ready Soft­ware (a great read by the way) he dis­cussed at length how unfavourable con­di­tions in pro­duc­tion can cause cracks to appear in your sys­tem, and through tight cou­pling and other anti-patterns these cracks can accel­er­ate and spread across your entire appli­ca­tion and even­tu­ally bring it crash­ing down to its knees.


In apply­ing exten­sive mon­i­tor­ing to our ser­vices we are able to:

  • see cracks appear­ing in pro­duc­tion early; and
  • col­lect exten­sive amount of data for the post-mortem; and
  • use knowl­edge gained dur­ing post-mortems to iden­tify early warn­ing signs and set up alarms accordingly


Intro­duc­ing Metricano

With our empha­sis on mon­i­tor­ing, it should come as no sur­prise that we have long sought to make it easy for our devel­op­ers to mon­i­tor dif­fer­ent aspects of their service.


Now, we’ve made it easy for you to do the same with Met­ri­cano, an agent–based frame­work for col­lect­ing, aggre­gat­ing and pub­lish­ing met­rics from your appli­ca­tion. From a high-level, the Met­ric­sAgent class col­lects met­rics and aggre­gates them into second-by-second sum­maries. These sum­maries are then pub­lished to all the pub­lish­ers you have configured.


Record­ing Metrics

There are a num­ber of options for you to record met­rics with Met­ric­sAgent:


You can call the Incre­ment­Count­Met­ric, or Record­Time­Met­ric meth­ods on an instance of IMet­ric­sAgent (you can use MetricsAgent.Default or cre­ate a new instance with MetricsAgent.Create), for example:


F# Work­flows

From F#, you can also use the built-in time­Met­ric and count­Met­ric workflows:


Post­Sharp Aspects

Lastly, you can also use the Coun­tEx­e­cu­tion and LogEx­e­cu­tion­Time attrib­utes from the Metricano.PostSharpAspects nuget pack­age, which can be applied at method, class and even assem­bly level.

The Coun­tEx­e­cu­tion attribute records a count met­ric with the fully qual­i­fied name of the method, whereas the LogEx­e­cu­tion­Time attribute records exe­cu­tion times into a time met­ric with the fully qual­i­fied name of the method. When applied at class and assem­bly level, the attrib­utes are multi-casted to all encom­passed meth­ods, pri­vate, pub­lic, instance and sta­tic. It’s pos­si­ble to tar­get spe­cific meth­ods, by name or vis­i­bil­ity, etc. please refer to PostSharp’s doc­u­men­ta­tion for detail.


Pub­lish­ing Metrics

All the met­rics you record won’t do you much good if they just stay inside the mem­ory of your appli­ca­tion server.

To get met­rics out of your appli­ca­tion server and into a mon­i­tor­ing ser­vice or dash­board, you need to tell Met­ri­cano to pub­lish met­rics with a set of pub­lish­ers. There is a ready made pub­lisher for Ama­zon Cloud­Watch ser­vice in the Metricano.CloudWatch nuget package.

To add a pub­lisher to the pipeline, use the Publish.With sta­tic method, see exam­ple here.

Since the low­est gran­u­lar­ity on Ama­zon Cloud­Watch is 1 minute, so as an opti­miza­tion to cut down on the num­ber of web requests (which also has a cost impact), Cloud­Watch­Pub­lisher will aggre­gate met­rics locally and only pub­lish the aggre­gates on a per minute basis.

If you want to pub­lish your met­ric data to another ser­vice (Stack­Driver or New Relic for instance), you can cre­ate your own pub­lisher by imple­ment­ing the very sim­ple IMet­ric­sPub­lisher inter­face. This sim­ple Con­solePub­lisher for instance, will cal­cu­late the 95 per­centile exe­cu­tion time and print them:


In gen­eral I find the 95/97/99 per­centile time met­rics much more infor­ma­tive than sim­ple aver­ages, since aver­ages are so eas­ily biased by even a small num­ber of out­ly­ing data points.



I hope you have enjoyed this post and that you’ll find Met­ri­cano a use­ful addi­tion in your application.

I highly rec­om­mend read­ing Release It!, much of the pat­terns and anti-patterns dis­cussed in the book are becom­ing more and more rel­e­vant in today’s world where we’re build­ing smaller, more gran­u­lar microser­vices. Even the sim­plest of appli­ca­tions have mul­ti­ple inte­gra­tion points – social net­works, cloud ser­vices, etc. – and they are places where cracks are likely to occur before they spread out to the rest of your appli­ca­tion, unless, you have taken the mea­sure to guard against such cas­cad­ing failures.

If you decide to buy the book from ama­zon, please use the link I pro­vide below or add the query string para­me­ter tag=theburningmon-21 to the URL so that I can get a small refer­ral fee and use it towards buy­ing more books and find­ing more inter­est­ing things to write about here Smile



Met­ri­cano project page

Release It!: Design and Deploy Production-Ready Software

Met­ri­cano nuget package

Metricano.CloudWatch nuget package

Metricano.PostSharpAspects nuget package

Red-White Push – Con­tin­u­ous Deliv­ery at Gamesys Social


The mon­ster trap­ping mechan­ics in Here Be Mon­sters is fairly straight forward:

  • Mon­sters have a type and a set of stats – Strength, Speed and IQ
  • They have a rar­ity value which deter­mines the like­li­hood of an encounter
  • They have a set of baits they like, which can increase the like­li­hood of an encounter
  • Traps can catch mon­sters of match­ing types
  • Traps also have a set of stats – Strength, Speed and Tech
  • Chance of catch­ing a mon­ster is deter­mined by the trap’s stats vs the monster’s stats

image image

It’s as sim­ple as it sounds. Unless, of course, you’re the game designer respon­si­ble for set­ting the stats for the trap so that:

a. you achieve the intended catch rate % against each of the mon­sters, and

b. the dis­tri­b­u­tion of the stats should ‘make sense’, i.e. a low-tech box trap should have higher stats in strength than Tech


The naive approach would be to start with a guessti­mate and then use trial-and-error until you con­verge upon an answer or an approx­i­ma­tion to the answer that is con­sid­ered good enough. The naive approach would be labo­ri­ous and error prone, and unlikely to yield the opti­mal result (bar­ring the Her­culean effort of a per­sis­tent game designer..).


To auto­mate this process and aid our game design­ers, we designed and imple­mented a sim­ple genetic algo­rithm in F# that would search and find the opti­mal solu­tion based on:

  • intended % catch rate for each monster
  • an error margin
  • an ini­tial set of stats that defines the ideal dis­tri­b­u­tion of stats

The game design­ers can use our cus­tom web tool to run the algo­rithm, for example:



In sim­ple terms, a genetic algo­rithm starts with a set of poten­tial solu­tions and iter­a­tively gen­er­ates new gen­er­a­tions of solu­tions using a selec­tion and a muta­tion process such that:

  • the selec­tion process chooses which of the solu­tions sur­vive (sur­vival of the fittest and all) based on a fit­ness function
  • the muta­tion process gen­er­ates a new solu­tions using the sur­viv­ing solutions

the iter­a­tion con­tin­ues until one of the ter­mi­na­tions con­di­tions have been met, for exam­ple, if a solu­tion is found or we’ve reached the max­i­mum num­ber of gen­er­a­tions allowed.



In our algo­rithm, each solu­tion is a set of stats for the trap, and the selec­tion process cal­cu­lates the catch rate for each of the mon­sters using the solu­tion, and keeps the solu­tion if it’s bet­ter than the solu­tion it’s mutated from.

The muta­tion process then takes each of he sur­viv­ing solu­tions and gen­er­ates new solu­tions by tweak­ing the stats in a num­ber of ways:

  • +/- a small amount from each of Strength/Speed/Tech (gen­er­ates bet­ter solu­tions when we’re close to opti­mal solutions)
  • +/- a large amount from each of Strength/Speed/Tech (gen­er­ates notice­ably dif­fer­ent solu­tions we’re far from opti­mal solutions)

So from an ini­tial solu­tion of Strength:100, Speed:100 and Tech:200, you can end up with a num­ber of dif­fer­ent solu­tions for the next generation:


This process con­tin­ues until either:

  • the max num­ber of gen­er­a­tions has been exhausted, or
  • none of the new solu­tions sur­vive the selec­tion process

the final sur­vivors are then fil­tered using the error mar­gin spec­i­fied by the game designer, and sorted by how close it’s to the spec­i­fied tar­get catch rates, e.g.



We have also applied the same tech­nique and imple­mented genetic algo­rithms to:

  • find stats for a mon­ster that will give it the intended catch rate for a num­ber of traps (the inverse of the above)
  • find con­fig­u­ra­tion for baits so that we can achieve the desired encounter rate with a mon­ster when using this bait (see below for an example)
  • image



So here you ago, I hope you enjoyed read­ing about another place where a bit of F# magic has come in handy.

The code for the genetic algo­rithms is not very com­pli­cated (or very long) but incred­i­bly spe­cific to our spe­cific domain, hence the lack of any code snip­pet in this post. But hope­fully I’ve man­aged to give you at least a flavour of what genetic algo­rithms are and how you might be able to apply them (with F# if pos­si­ble!) in your own solution.


In our MMORPG title Here Be Mon­sters, we offer the play­ers a vir­tual world to explore where they can visit towns and spots; for­age fruits and gather insects and flow­ers; tend to farms and ani­mals in their home­steads; make in-game bud­dies and help each other out; craft new items using things they find in their trav­els; catch and cure mon­sters cor­rupted by the plague; help out trou­bled NPCs and aid the Min­istry of Mon­sters in its strug­gle against the cor­rup­tion, and much more!

All and all, there are close to a hun­dred dis­tinct actions that can be per­formed in the game and more are added as the game expands. At the very cen­tre of every­thing you do in the game, is a quest and achieve­ments sys­tem that can tap into all these actions and reward you once you’ve com­pleted a series of requirements.


The Chal­lenge

How­ever, such a sys­tem is com­pli­cated by the snow­ball effect that can occur fol­low­ing any num­ber of actions. The fol­low­ing ani­mated GIF paints an accu­rate pic­ture of a cyclic set of chain reac­tions that can occurred fol­low­ing a sim­ple action:


In this instance,

  1. catch­ing a Gnome awards EXP, gold and occa­sion­ally loot drops, in addi­tion to ful­fill­ing any require­ment for catch­ing a gnome;
  2. get­ting the item as loot ful­fils any require­ments for you to acquire that item;
  3. the EXP and gold awarded to the player can ful­fil require­ments for acquir­ing cer­tain amounts of EXP or gold respective;
  4. the EXP can allow the player to level up;
  5. lev­el­ling up can then ful­fil a require­ment for reach­ing a cer­tain level as well as unlock­ing new quests that were pre­vi­ously level-locked;
  6. lev­el­ling up can also award you with items and gold and the cycle continues;
  7. if all the require­ments for a quest are ful­filled then the quest is complete;
  8. com­plet­ing a quest will in turn yield fur­ther rewards of EXP, gold and items and restarts the cycle;
  9. com­plet­ing a quest can also unlock follow-up quests as well as ful­fill­ing quest-completion requirements.


The same require­ments sys­tem is also in place for achieve­ments, which rep­re­sent longer term goals for play­ers to play for (e.g. catch 500 spirit mon­sters). The achieve­ment and quest sys­tems are co-dependent and feeds into each other, many of the mile­stone achieve­ments we cur­rently have in the game depend upon quests to be completed:


Tech­ni­cally there is a ‘remote’ pos­si­bil­ity of dead­locks but right now it exists only as a pos­si­bil­ity since new quest/achievement con­tents are gen­er­ally played through many many times by many peo­ple involved in the con­tent gen­er­a­tion process to ensure that they are fun, achiev­able and that at no point will the play­ers be left in a state of limbo.


This cycle of chain reac­tions intro­duces some inter­est­ing imple­men­ta­tion challenges.

For starters, the dif­fer­ent events in the cycle (lev­el­ling up, catch­ing a mon­ster, com­plet­ing a quest, etc.) are han­dled and trig­gered from dif­fer­ent abstrac­tion lay­ers that are loosely cou­pled together, e.g.

  • Level con­troller encap­su­lates all logic related to award­ing EXP and lev­el­ling up.
  • Trap­ping con­troller encap­su­lates all logic related to mon­ster catching.
  • Quest con­troller encap­su­lates all logic related to quest trig­ger­ing, pro­gress­ing and completions.
  • Require­ment con­troller encap­su­lates all logic related to man­ag­ing the progress of requirements.
  • and many more..

Func­tion­ally, the con­trollers form a nat­ural hier­ar­chy whereby higher-order con­trollers (such as the trap­ping con­troller) depend upon lower-order con­trollers (such as level con­troller) because they need to be able award play­ers with EXP and items etc. How­ever, in order to facil­i­tate the desired flow, the­o­ret­i­cally all con­trollers will need to be able to lis­ten and react to events trig­gered by all other controllers..


To make mat­ter worse, there are also non-functional require­ments which also requires the abil­ity to tap into this rich and con­tin­u­ous stream of events, such as:

  • Ana­lyt­ics track­ing – every action the player takes in the game is recorded along with the con­text in which they occurred (e.g. caught a gnome with the trap X, acquired item Z, com­pleted quest Q, etc.)
  • 3rd party report­ing – notify ad part­ners on key mile­stones to help them track and mon­i­tor the effec­tive­ness of dif­fer­ent ad campaigns
  • etc..


For the com­po­nents that process this stream of events, we also wanted to make sure that our imple­men­ta­tion is:

  1. strongly cohe­sive – code that are deal­ing with a par­tic­u­lar fea­ture (quests, ana­lyt­ics track­ing, com­mu­nity goals, etc.) are encap­su­lated within the same module
  2. loosely cou­pled – code that deals with dif­fer­ent fea­tures should not be directly depen­dent on each other and where pos­si­ble they should exist com­pletely independently

Since the events are gen­er­ated and processed within the con­text of one HTTP request (the ini­tial action from the user), the stream also have a life­time that is scoped to the HTTP request itself.


And finally, in terms of per­for­mance, whilst it’s not a latency crit­i­cal sys­tem (gen­er­ally a round-trip latency of sub-1s is accept­able) we gen­er­ally aim for a response time (between request reach­ing the server and the server send­ing back a response) of 50ms to ensure a good round-trip latency from the user’s perspective.

In prac­tice though, the last-mile latency (from your ISP to you) has proven to be the most sig­nif­i­cant fac­tor in deter­min­ing the round-trip latency.


The Solu­tion

After con­sid­er­ing sev­eral approaches:

  • Vanilla .Net events
  • Reac­tive Exten­sions (Rx)
  • CEP plat­forms such as Esper or StreamInsight

we decided to go with a tailor-made solu­tion for the prob­lem at hand.

In this solu­tion we intro­duced two abstractions:

  • Facts – which are spe­cial events for the pur­pose of this par­tic­u­lar sys­tem, we call them facts in order to dis­tin­guish them from the events we record for ana­lyt­ics pur­pose already. A fact con­tains infor­ma­tion about an action or a state change as well as the con­text in which it occurred, e.g. a Caught­Mon­ster fact would con­tain infor­ma­tion about the mon­ster, the trap, the bait used, where in the world the action occurred, as well as the rewards the player received.
  • Fact Proces­sor – a com­po­nent which processes a fact.


As a request (e.g. to check our trap to see if we’ve caught a mon­ster) comes in the des­ig­nated request han­dler will first per­form all the rel­e­vant game logic for that par­tic­u­lar request, accu­mu­lat­ing facts along the way from the dif­fer­ent abstrac­tion lay­ers that have to work together to process this request.

At the end of the core game logic, the accu­mu­lated facts is then for­warded to each of the con­fig­ured fact proces­sors in turn. The fact proces­sors might choose to process or ignore each of the facts.

In choos­ing to process a fact the fact proces­sors can cause state changes or other inter­est­ing events to occur which results in follow-up facts to be added to the queue.



The sys­tem described above has the ben­e­fits of being:

  • Sim­ple – easy to under­stand and rea­son with, easy to mod­u­larise, no com­plex orches­tra­tion logic or spaghetti code.
  • Flex­i­ble – easy to change infor­ma­tion cap­tured by facts and pro­cess­ing logic in fact proces­sors
  • Exten­si­ble – easy to add new facts and/or fact proces­sors into the system

The one big down­side being that for the sys­tem to work it requires many types of facts which means it could poten­tially add to your main­te­nance over­head and requires lots of boil­er­plate class setup.


To address these poten­tial issues, we turned to F#’s dis­crim­i­nated unions over stan­dard .Net classes for its suc­cinct­ness. For a small num­ber of facts you can have some­thing as sim­ple as the following:


How­ever, as we men­tioned ear­lier, there are a lot of dif­fer­ent actions that can be per­formed in Here Be Mon­sters and there­fore many facts will be required to track those actions as well as the state changes that occur dur­ing those actions. The sim­ple approach above is not a scal­able solu­tion in this case.

Instead, you could use a com­bi­na­tion of marker inter­face and pat­tern match­ing to split the facts into a num­ber of spe­cial­ized dis­crim­i­nated union types.


Update  2014/07/28 : thank you to @johnazariah for bring­ing this up, the rea­son for choos­ing to use a marker inter­face rather than a hier­ar­chi­cal dis­crim­i­nated union in this case is because it makes interop with C# easier.

In C#, you can cre­ate the StateChangeFacts.LevelUp union clause above using the com­piler gen­er­ated StateChangeFacts.NewLevelUp sta­tic method but it’s not as read­able as the equiv­a­lent F# code.

With a hier­ar­chi­cal DU the code will be even less read­able, e.g. Fact.NewStateChange(StateChangeFacts.NewLevelUp(…))


To wrap things up, once all the facts are processed and we have dealt with the request in full we need to gen­er­ate a response back to the client to report all the changes to the player’s state as a result of this request. To sim­plify the process of track­ing these state changes and to keep the code­base main­tain­able we make use of a Con­text object for the cur­rent request (sim­i­lar to HttpContext.Current) and make sure that each state change (e.g. EXP, energy, etc.) occurs in only one place in the code­base and that change is tracked at the point where it occurs.

At the end of each request, all the changes that has been col­lected is then copied from the cur­rent Con­text object onto the response object if it imple­ments the rel­e­vant inter­face – for exam­ple, all the quest-related state changes are copied onto a response object if it imple­ments the IHasQuestChanges interface.


Related Posts

F# – use Dis­crim­i­nated Unions instead of Classes

F# – extend­ing Dis­crim­i­nated Unions using marker interfaces