Slides and videos for my Oredev talks on Neo4j and APL

Hello, here are the recordings of my two talks at Oredev with accompanying slides.


Modelling game economy with Neo4j

In Here Be Monsters, we have a MMORPG that is content-heavy with over:

  • 5000 items
  • 800 recipes
  • 500 locations
  • 1500 quests

and since the contents are highly connected, it makes balancing the game a rather interesting and challenging problem for our small team behind the project.

The Challenge

Consider a simple example involving the Camouflage Trap, which is one of the very first traps you’ll make. It’s made from a number of ingredients, each can be:

  • found in the certain parts of the world
  • purchased at shops
  • awarded for completing quests/achievements
  • crafted using other ingredients

Now, suppose you want to raise the price of basic ingredients such as Water, that increase needs to propagate all the way through the chain.

Furthermore, when you consider how many items are made from Water, and how many more items are made from those items.

There’s a huge knock-on effect here. Failing to address these knock-on effects will create arbitrage opportunities in your economy for players to exploit. They will be able to mine coins by buying and selling items and will therefore have less need to make real-money purchases with us.

As a game designer, whenever you want to make a change you are faced with this huge uncertainty because of the hidden knock-on effects that you don’t see.

It’s a complexity that we have to manage, but managing this complexity by hand is:

  • laborious – involving many iterations of trial and error, and likely repeated each time new contents are added
  • slow
  • error prone
  • subjective – what ‘feels’ right can vary greatly from person to person

Instead, we opted for a more automated process whereby every item’s intrinsic value in the game can be evaluated based on the values of its inputs (e.g. baits for monsters, ingredients for crafted items, etc.). The amount of time involved, and your chance of success is also taken into account. We can then use the intrinsic value of an item to drive or quantify other less tangible aspects of the game’s model.

This is where Neo4j comes in.

Hello, Neo

One of the main challenges that came up in our effort to automate the economic balancing was to understand the complex relationships between items, quests, achievements, as well as locations and activities that can be performed against/with specific items at specific locations.

Take Bigfoot as an example, from his almanac page in the game:

We can see that to catch the Bigfoot you need to consider:

  • location – he’s only available in certain parts of the world
  • bait – to lure him out you need the Alluring Goat which gives you a roughly 4 in 7 chance of seeing the monster
  • trap – you need a trap strong enough to hold him after you’ve managed to lure him out, the Musket-teer Trap has a 5 in 7 chance of success
  • loot – upon a successful catch, Bigfoot occasionally drops Bigfoot Toenail Clippings as loot, which you might need in future quests or as ingredient to make other items

We can model all the information we see on this screen as a graph in Neo4j:

Additionally, each node and edge can have an arbitrary set of properties.

For instance, Bigfoot and the Musket-teer Trap will have their stats. The lootsrelationship that exists between Bigfoot and his loot will also specify the chance of him dropping the loot.

Looking beyond the immediate connections to Bigfoot, the Alluring Goat and Musket-teer Trap each have numerous connections of their own.

To make the Alluring Goat, you need to gather:

  • Honey, which you need to build a Bee Hive in your homestead and harvest from it;
  • Goat, which you can buy from animal traders in cities around the world such asLondon and NanJing;
  • Golden Hair, which is a loot dropped by Blonde Mermaid, Dandelion Pixie and Blonde Selkie;

This diagram illustrates the highly connected and complex nature of the data we’re dealing with – the result of us making a game where everything you can do and every item you find has a purpose and can be used for something else.

It is also by no means an unconnected subset of the overall graph. For simplicity sake I have omitted many types of relationships and connected nodes.

Visualizing the 8,000+ nodes and around 40,000+ edges in Gephi, where the colour and size of the nodes represent the number of connections they have, this is what the internal data model for Here Be Monsters look like:

As you can see, the degree of connectedness varies greatly. For common low-level monsters such as Sylph, Spriggan, Sprite and Salamander, they are each connected to no less than 300 locations, traps and items.

Similarly, for common ingredients such as Salt, it can be found through many items (e.g. most fish drop Seaweed and Salt as loot when you catch them) and is used in many recipes – Pastry, Pizza Base, Ketchup to name a few.

With the internal data model captured as a graph in Neo4j, we were able to ask some interesting questions that would have been difficult or impossible to answer otherwise.

To give you a few examples.

Impact Analysis

In the earlier pricing example with Water and Camouflage Trap, the key challenge is to be able to understand the impact of change. This is a very similar problem to the ones you face in derivative pricing in Finance.

If you take White Bread as an example, to work out the blast radius of a price change, let’s look at the relationships that exist between an item and a recipe.

To find all the items that are made from White Bread, directly or indirectly, we can run the following Cypher (which is the built-in query language for Neo4j) query against our Neo4j database:

(wb:BaseItem { Name:”White Bread”})
-[rel:CRAFTS | IS_USED_IN*1..] ->(i:BaseItem)
RETURN i, rel, wb

Couple of things to note from this query.

  • notice how we are basically pattern matching against the graph using the pattern node-[relationship]->node?
  • we capture the nodes and relationships into variables wb, i and rel so we can return them from the query;
  • we can optionally filter the nodes by type, e.g. (i:BaseItem) will only match against nodes that are of type BaseItem;
  • to identify White Bread, we also filter one of the nodes by the value of its properties, in this case the node wb must have a Name property with the value of White Bread;
  • we can use OR semantic when filtering on relationship types, here we’re looking for relationships of type CRAFTS or IS_USED_IN;
  • for the pattern to work, there must exist 1 or more instance of such relationships between any two nodes, hence the cardinality of *1..

Running this query yields the following result, where the purple nodes are items and red nodes are recipes.

From this graph we can see that White Bread is used directly in 10 recipes, and then indirectly (virtue of being an ingredient for making Sausage) in a further 5. If we were to change the price of White Bread, all 15 of these items will need to have their prices adjusted based on the number of White Bread required to make them.

For example, if it takes 2 pieces of White Bread to make 1 Sausage and 2 Sausage to make a Full English Breakfast, then the change to White Bread‘s price would need to be multiplied by 4 when applied to Full English Breakfast.

Scarcity Analysis

Not all items can be priced as derivatives of others. Some need to be priced based on their scarcity in the world, such as the fruits that you can forage from fruit trees you find on your travels.

To find out how scarcely available Durian and Dragonfruit is, we can use the following Cypher query:

fruit.Name=‘Durian’ OR fruit.Name=‘Dragonfruit’
RETURN fruit, tree, spot

Again, we’re simply pattern matching against our graph using the pattern node<-[relationship1]-node-[relationship2]->node. The expressive power of Cypher lies in the ability to take our relationship diagram above and translate it like-for-like into an executable query.

You might also noticed that I’m not filtering any of the nodes by type here. This is because I know those relationships exist only between the specified types of nodes, hence it’s safe for me to omit them.

Immediately, you can see that Dragonfruit Tree is much more readily available in the world compared to Durian Tree. However, you still need to consider:

  • the number of trees at each location, which you can find out from the EXISTS_IN relationship
  • the number of fruits you get by foraging the tree, which you can find out from the FORAGES relationship

Taking all these factors into account, we can set prices for Durian and Dragonfruit which reflects their scarcity in the world.

Quest Progression

Some quests require specific items to complete. For instance, an NPC might ask you to fetch an item from Bob in Cambridge, or find some feature under a rock somewhere, or catch a Griffin and get a Griffin Egg as loot.

On the other hand, completing a quest can sometimes award you items as well. If the quest is part of a quest line then you will also unlock follow-up quests too, so there is a self-recursive relationship there.

To answer questions such as

What comes after the Year of the Horse quest?

you can use a simple Cypher query like the one below.

(q1:Quest { Name: “Year of the Horse” })
-[:UNLOCKS] ->(q2:Quest)
RETURN q1, q2

From the resulting graph, you can quickly see the quests that are unlocked by completing the Year of the Horse quest.

In fact, if you connect all the quests in the game then you’ll end up with the following.

No wonder our game designers need a hand working with the data!

But, just being able to work out how quests are connected to each other and visualize them is not all that exciting or useful. We can do much more.

With our price model in full swing, we are able to:

  1. price baseline items based on factors such as scarcity
  2. price items that are derived from the baseline items

Since the price of an item reflects the difficulty in obtaining it, we can make use of the relationships between quests and items to “price” quests the same way – i.e. the more expensive the items a quest require, the more difficult that quest is to complete.

From there, you can establish simple rules such as:

  • cheaper quests should come before more expensive ones, to ensure a sense of progression for the players
  • a quest should not reward items whose total price exceeds the quest’s price

Monster Hierarchy

Finally, monster trapping is a big part of the game as its name suggests. As mentioned earlier, to catch a monster you need the right combination of bait and trap.

To catch a monster, sometimes you have to first catch a lower level monster; get its loot; and use the loot to make the bait for the monster you want to catch. Using the following relationships you can place the monsters into a hierarchy.

Which can be translated into the following Cypher query:

-[r:IS_USED_IN | CRAFTS*0..] ->(bait)-[:CAN_ATTRACT]->(monster2)
RETURN monster1,  monster2

Again, see how it mirrors our diagram?

Suppose we are on a quest to catch Bigfoot, we can use this query to identify the monsters we have to catch first in order to get the ingredients to make the bait for Bigfoot. The query yields the following result where the purple nodes are monsters and the red node in the middle is the recipe for crafting the Alluring Goat.

This places Bigfoot at the peak of its hierarchy.

If you repeat the same exercise for every monster in the game and compose their hierarchies together, then you’ll end up with a more complete monster hierarchy covering most of the monsters that exist in the game.

Once we have both the quest hierarchy and monster hierarchy we can do some interesting analysis.

For instance, if completing Quest 1 unlocks Quest 2, and catching Monster 2 gives you the loot you need to make the bait for Monster 1:

then Quest 1 cannot ask the player to catch Monster 1 if Quest 2 asks the player to catch Monster 2.

This is to ensure that we do not break the sense of progression as the player progresses through the quests.

Otherwise, as a player, you have to catch Monster 2 multiple times to get the loot to make the bait for Monster 1, and then take several attempts to successfully catch Monster 1. Just as you had finished with that cycle, the very next quest (or shortly after) you are asked to catch the same monster again, which doesn’t make for a very satisfying playing experience.

Remember, the situation might be even worse if you have to first catch other monsters in order to make the bait for Monster 2.

Even more Impact Analysis

We looked at how impact analysis applies to item pricing earlier, but we have another interesting use for impact analysis with regards to the monster hierarchy.

When you successfully catch a monster, you receive a gold reward from the Ministry of Monsters and a chance to get the monster’s loot.

Successful Catch = Gold + (maybe)Loot

Therefore, for monster catching, there’s an equation that needs to be balanced:

Whenever you change the price of an item that is either a bait or a loot, it can have a profound impact on the monster hierarchy:

  • change to one side of the equation (item price, drop/attraction rate, gold reward) requires change to the other side to keep things in balance
  • change to the input side of the equation requires changes to all preceding monsters in the hierarchy
  • change to the output side requires changes to all subsequent monsters in the hierarchy

Thankfully, Neo4j makes this really easy, which is important because whenever we introduce a new monster (and it happens pretty regularly) it has an impact on all other monsters in the same region as there is a new competitor for food!


I hope I have given you a flavour of our use case with Neo4j. In general I find graph databases to be the most powerful and natural way to model a domain, especially for domains with complex and/or highly connected datasets.

Slides and Recording

Red-White Push – Continuous Delivery at Gamesys Social

Nowadays you see plenty of stories about Continuous Integration, Continuous Delivery and Continuous Deployment on the web, and it’s great to see that the industry is moving in this direction, with more and more focus on automation rather than hiring humans to do a job that machines are so much better at.

But, most of these stories are also not very interesting because they tend to revolve around MVC-based web sites that controls both the server and the client (since the client is just the server-generated HTML) and there’s really no synchronization or backward compatibility issues between the server and the client. It’s a great place to be to not have those problems, but they are real concerns for us for reasons we’ll go into shortly.


The Netflix Way

One notable exception is the continuous deployment story from Netflix, which Carl Quinn also talked about as part of an overview of the Netflix architecture in this presentation.

For me, there are a number of things that make the Netflix continuous deployment story interesting and worth studying:

  • Scale – more than 1000 different client devices and over a quarter of the internet traffic
  • Aminator – whilst most of us try to avoid creating new AMIs when we need to deploy new versions of our code, Netflix has decided to go the other way and instead automate away the painful, manual steps involved with creating new AMIs and in return get better start-up time as their VMs comes pre-baked


  • Use of Canary Deployment – dipping your toe in the water by routing a small fraction of your traffic to a canary cluster to test it out in the wild (it’s worth mentioning that this facility is also provided out-of-the-box by Google AppEngine)
  • Red/Black push – a clever word play (and reference to the Netflix colour I presume?) on the classic blue-green deployment, but also making use of AWS’s auto-scaling service as well as Netflix’s very own Zuul and Asgard services for routing and deployment.


I’ve not heard any updates yet, but I’m very interested to see how the Netflix deployment pipeline has changed over the last 12 months, especially now that Docker has become widely accepted in the DevOps community. I wonder if it’s a viable alternative to baking AMIs and instead Aminator can be adopted (and renamed since it’s no longer baking AMIs) to bake Docker images instead which can then be fetched and deployed from a private repository.

If you have see any recent talks/posts that provides more up-to-date information, please feel free to share in the comments.


Need for Backward Compatibility

One interesting omission from all the Netflix articles and talks I have found so far has been how they manage backward compatibility issues between their server and client. One would assume that it must be an issue that comes up regularly whenever you introduce a big new feature or breaking changes to your API and you are not able to do a synchronous, controlled update to all your clients.

To illustrate a simple scenario that we run into regularly, let’s suppose that in a client-server setup:

  • we have an iPhone/iPad client for our service which is currently version 1.0
  • we want to release a new version 1.1 with brand spanking new features
  • version 1.1 requires breaking changes to the service API


In the scenario outlined above, the server changes must be deployed before reviewers from Apple open up the submitted build or else they will find an unusable/unstable application that they’ll no doubt fail and put you back to square one.

Additionally, after the new version has been approved and you have marked it as available in the AppStore, it takes up to a further 4 hours before the change is propagated through the AppStore globally.

This means your new server code has to be backward compatible with the existing (version 1.0) client.


In our case, we currently operate a number of social games on Facebook and mobile (both iOS and Android devices) and each game has a complete and independent ecosystem of backend services that support all its client platforms.

Backward compatibility is an important issue for us because of scenarios such as the one above, which is further complicated by the involvement of other app stores and platforms such as Google Play and Amazon App Store.

We also found through experience that every time we force our players to update the game on their mobile devices we alienate and anger a fair chunk of our player base who will leave the game for good and occasionally leave harsh reviews along the way. Which is why even though we have the capability to force players to update, it’s a capability that we use only as a last resort. The implication being that in practice you can have many versions of clients all accessing the same backend service which has to maintain backward compatibility all the way through.


Deployment at Gamesys Social

Currently, most of our games follow this basic deployment flow:



The steps involved in releasing to production follow the basic principles of Blue-Green Deployment and although it helps eliminate downtime (since we are pushing out changes in the background whilst keeping the service running so there is no visible disruption from the client’s point-of-view) it does nothing to eliminate or reduce the need for maintaining backward compatibility.

Instead, we diligently manage backward compatibility via a combination of careful planning, communication, domain expertise and testing. Whilst it has served us well enough so far it’s hardly fool-proof, not to mention the amount of coordinated efforts required and the extra complexity it introduces to our codebase.


Having considered going down the API versioning route and the maintainability implications we decided to look for a different way, which is how we ended up with a variant of Netflix’s Red-Black deployment approach we internally refer to as..


Red-White Push

Our Red-White Push approach takes advantage of our existing discovery mechanism whereby the client authenticates itself against a client-specific endpoint along with the client build version.

Based on the client type and version the discovery service routes the client to the corresponding cluster of game servers.


With this new flow, the earlier example might look something like this instead:


The key differences are:

  • instead of deploying over existing service whilst maintaining backward compatibility, we deploy to a new cluster of nodes which will only be accessed by v1.1 clients, hence no need to support backward compatibility
  • existing v1.0 clients will continue to operate and will access the cluster of nodes running old (but compatible) server code
  • scale down the white cluster gradually as players update to v1.1 client
  • until such time that we decide to no longer support v1.0 clients then we can safely terminate the white cluster


Despite what the name suggests, you are not actually limited to only red and white clusters. Furthermore, you can still use the aforementioned Blue-Green Deployment for releases that doesn’t introduce breaking changes (and therefore require synchronized updates to both client and server).


We’re still a long way from where we want to be and there are still lots of things in our release process that need to be improved and automated, but we have come a long way from even 12 months ago.

As one of my ex-colleagues said:

“Releases are not exciting anymore”

– Will Knox-Walker

and that is the point – making releases non-events through automation.



Netflix – Deploying the Netflix API

Netflix – Preparing the Netflix API for Deployment

Netflix – Announcing Zuul : Edge Service in the Cloud

Netflix – How we use Zuul at Netflix

Netflix OSS Cloud Architecture (Parleys presentation)

Continuous Delivery at Netflix – From Code to the Monkeys

Continuous Delivery vs Continuous Deployment

Martin Fowler – Blue-Green Deployment

ThoughtWorks – Implementing Blue-Green Deployments with AWS

Martin Fowler – Microservices

Here Be Monsters – Message broker that links all things

In our MMORPG title Here Be Monsters, we offer the players a virtual world to explore where they can visit towns and spots; forage fruits and gather insects and flowers; tend to farms and animals in their homesteads; make in-game buddies and help each other out; craft new items using things they find in their travels; catch and cure monsters corrupted by the plague; help out troubled NPCs and aid the Ministry of Monsters in its struggle against the corruption, and much more!

All and all, there are close to a hundred distinct actions that can be performed in the game and more are added as the game expands. At the very centre of everything you do in the game, is a quest and achievements system that can tap into all these actions and reward you once you’ve completed a series of requirements.


The Challenge

However, such a system is complicated by the snowball effect that can occur following any number of actions. The following animated GIF paints an accurate picture of a cyclic set of chain reactions that can occurred following a simple action:


In this instance,

  1. catching a Gnome awards EXP, gold and occasionally loot drops, in addition to fulfilling any requirement for catching a gnome;
  2. getting the item as loot fulfils any requirements for you to acquire that item;
  3. the EXP and gold awarded to the player can fulfil requirements for acquiring certain amounts of EXP or gold respective;
  4. the EXP can allow the player to level up;
  5. levelling up can then fulfil a requirement for reaching a certain level as well as unlocking new quests that were previously level-locked;
  6. levelling up can also award you with items and gold and the cycle continues;
  7. if all the requirements for a quest are fulfilled then the quest is complete;
  8. completing a quest will in turn yield further rewards of EXP, gold and items and restarts the cycle;
  9. completing a quest can also unlock follow-up quests as well as fulfilling quest-completion requirements.


The same requirements system is also in place for achievements, which represent longer term goals for players to play for (e.g. catch 500 spirit monsters). The achievement and quest systems are co-dependent and feeds into each other, many of the milestone achievements we currently have in the game depend upon quests to be completed:


Technically there is a ‘remote’ possibility of deadlocks but right now it exists only as a possibility since new quest/achievement contents are generally played through many many times by many people involved in the content generation process to ensure that they are fun, achievable and that at no point will the players be left in a state of limbo.


This cycle of chain reactions introduces some interesting implementation challenges.

For starters, the different events in the cycle (levelling up, catching a monster, completing a quest, etc.) are handled and triggered from different abstraction layers that are loosely coupled together, e.g.

  • Level controller encapsulates all logic related to awarding EXP and levelling up.
  • Trapping controller encapsulates all logic related to monster catching.
  • Quest controller encapsulates all logic related to quest triggering, progressing and completions.
  • Requirement controller encapsulates all logic related to managing the progress of requirements.
  • and many more..

Functionally, the controllers form a natural hierarchy whereby higher-order controllers (such as the trapping controller) depend upon lower-order controllers (such as level controller) because they need to be able award players with EXP and items etc. However, in order to facilitate the desired flow, theoretically all controllers will need to be able to listen and react to events triggered by all other controllers..


To make matter worse, there are also non-functional requirements which also requires the ability to tap into this rich and continuous stream of events, such as:

  • Analytics tracking – every action the player takes in the game is recorded along with the context in which they occurred (e.g. caught a gnome with the trap X, acquired item Z, completed quest Q, etc.)
  • 3rd party reporting – notify ad partners on key milestones to help them track and monitor the effectiveness of different ad campaigns
  • etc..


For the components that process this stream of events, we also wanted to make sure that our implementation is:

  1. strongly cohesive – code that are dealing with a particular feature (quests, analytics tracking, community goals, etc.) are encapsulated within the same module
  2. loosely coupled – code that deals with different features should not be directly dependent on each other and where possible they should exist completely independently

Since the events are generated and processed within the context of one HTTP request (the initial action from the user), the stream also have a lifetime that is scoped to the HTTP request itself.


And finally, in terms of performance, whilst it’s not a latency critical system (generally a round-trip latency of sub-1s is acceptable) we generally aim for a response time (between request reaching the server and the server sending back a response) of 50ms to ensure a good round-trip latency from the user’s perspective.

In practice though, the last-mile latency (from your ISP to you) has proven to be the most significant factor in determining the round-trip latency.


The Solution

After considering several approaches:

  • Vanilla .Net events
  • Reactive Extensions (Rx)
  • CEP platforms such as Esper or StreamInsight

we decided to go with a tailor-made solution for the problem at hand.

In this solution we introduced two abstractions:

  • Facts – which are special events for the purpose of this particular system, we call them facts in order to distinguish them from the events we record for analytics purpose already. A fact contains information about an action or a state change as well as the context in which it occurred, e.g. a CaughtMonster fact would contain information about the monster, the trap, the bait used, where in the world the action occurred, as well as the rewards the player received.
  • Fact Processor – a component which processes a fact.


As a request (e.g. to check our trap to see if we’ve caught a monster) comes in the designated request handler will first perform all the relevant game logic for that particular request, accumulating facts along the way from the different abstraction layers that have to work together to process this request.

At the end of the core game logic, the accumulated facts is then forwarded to each of the configured fact processors in turn. The fact processors might choose to process or ignore each of the facts.

In choosing to process a fact the fact processors can cause state changes or other interesting events to occur which results in follow-up facts to be added to the queue.



The system described above has the benefits of being:

  • Simple – easy to understand and reason with, easy to modularise, no complex orchestration logic or spaghetti code.
  • Flexible – easy to change information captured by facts and processing logic in fact processors
  • Extensible – easy to add new facts and/or fact processors into the system

The one big downside being that for the system to work it requires many types of facts which means it could potentially add to your maintenance overhead and requires lots of boilerplate class setup.


To address these potential issues, we turned to F#’s discriminated unions over standard .Net classes for its succinctness. For a small number of facts you can have something as simple as the following:


However, as we mentioned earlier, there are a lot of different actions that can be performed in Here Be Monsters and therefore many facts will be required to track those actions as well as the state changes that occur during those actions. The simple approach above is not a scalable solution in this case.

Instead, you could use a combination of marker interface and pattern matching to split the facts into a number of specialized discriminated union types.


Update  2014/07/28 : thank you to @johnazariah for bringing this up, the reason for choosing to use a marker interface rather than a hierarchical discriminated union in this case is because it makes interop with C# easier.

In C#, you can create the StateChangeFacts.LevelUp union clause above using the compiler generated StateChangeFacts.NewLevelUp static method but it’s not as readable as the equivalent F# code.

With a hierarchical DU the code will be even less readable, e.g. Fact.NewStateChange(StateChangeFacts.NewLevelUp(…))


To wrap things up, once all the facts are processed and we have dealt with the request in full we need to generate a response back to the client to report all the changes to the player’s state as a result of this request. To simplify the process of tracking these state changes and to keep the codebase maintainable we make use of a Context object for the current request (similar to HttpContext.Current) and make sure that each state change (e.g. EXP, energy, etc.) occurs in only one place in the codebase and that change is tracked at the point where it occurs.

At the end of each request, all the changes that has been collected is then copied from the current Context object onto the response object if it implements the relevant interface – for example, all the quest-related state changes are copied onto a response object if it implements the IHasQuestChanges interface.


Related Posts

F# – use Discriminated Unions instead of Classes

F# – extending Discriminated Unions using marker interfaces

Recording for my F# in Social Gaming talk at CodeMesh is up

I gave a talk about our use of F# at last year’s CodeMesh event, and the recording is now up on Vimeo.

Yan Cui – F# in Social Gaming from Erlang Solutions on Vimeo.


You can also find the slides for the talk up on SlideShare: