Make flame with Elm

A friend of mine, Roger Engel­ber, pointed me to a nice arti­cle  on doing func­tional pro­gram­ming in Lua. The arti­cle detailed the steps to gen­er­ate a flame like effects using a sim­ple par­ti­cle system.

Of course, it nat­u­rally lead to me try­ing to do the same in Elm!

 

To trans­late the approach was really straight for­ward, though there are some minor dif­fer­ences, e.g. alpha val­ues in Elm are between 0 to 1 but 0 to 255 in Lua.

 

The code is avail­able on github, feel free to poke around.

Here are two vari­a­tions in action:

 

Links

Source code

Live demo 1

Live demo 2

Live demo 3

Live demo 4

Solving the Stable Marriage problem in Erlang

sm

Whilst talk­ing with an ex-colleague, a ques­tion came up on how to imple­ment the Sta­ble Mar­riage prob­lem using a mes­sage pass­ing approach. Nat­u­rally, I wanted to answer that ques­tion with Erlang!

Let’s first dis­sect the prob­lem and decide what processes we need and how they need to inter­act with one another.

The sta­ble mar­riage prob­lem is com­monly stated as:

Given n men and n women, where each per­son has ranked all mem­bers of the oppo­site sex with a unique num­ber between 1 and n in order of pref­er­ence, marry the men and women together such that there are no two peo­ple of oppo­site sex who would both rather have each other than their cur­rent part­ners. If there are no such peo­ple, all the mar­riages are “sta­ble”. (It is assumed that the par­tic­i­pants are binary gen­dered and that mar­riages are not same-sex).

From the prob­lem descrip­tion, we can see that we need:

  • a mod­ule for man
  • a mod­ule for woman
  • a mod­ule for orches­trat­ing the experiment

In terms of inter­ac­tion between the dif­fer­ent mod­ules, I imag­ined some­thing along the line of the following:

image

The pro­posal com­mu­ni­ca­tion needs to be syn­chro­nous* as the man can­not pro­ceed until he gets an answer for his pro­posal. But all other com­mu­ni­ca­tions can be asynchronous.

(* remem­ber, a syn­chro­nous call in OTP-sense is not the same as a syn­chro­nous call in Java/C# where the call­ing thread is blocked.

In this case the com­mu­ni­ca­tion still hap­pens via asyn­chro­nous mes­sage pass­ing, but the call­ing process asyn­chro­nously waits for a reply before mov­ing on)

From here, the imple­men­ta­tion itself is pretty straight forward.

And using the test data from Rosetta Code, you can have a mod­ule that sets up the test, which I’ve called stable_marriage here.

Com­pile and run the code gives the fol­low­ing output:

image

You can get the full source code for this solu­tion on github.

Enjoy!

Links

Repo for solution

Rosetta Code test data

QCon London 2015–Takeaways from “The Bad Idea Terminator”

the_build_trap

It’s very unchar­ac­ter­is­tic of me, but I went to a ses­sion on the prod­uct man­age­ment track at QCon Lon­don Melissa Per­ris’ “The Bad Idea Ter­mi­na­tor”. Hav­ing gone in the room with the expec­ta­tion of com­ing out not much wiser, I was pleas­antly sur­prised to find myself in one of the best talks at the conference.

 

Melissa used Fire­stone and FourSquare as exam­ple of the “build­ing trap” whereby ail­ing com­pa­nies try to coun­ter­act their decline by adding more fea­tures with­out really chang­ing they do things.

We often start off doing things right – we test and iter­ate on our ideas before we hit the mar­ket, and then we end up with some­thing that peo­ple want to use. But then we just keep on build­ing with­out going back to find­ing those inno­v­a­tive ideas that peo­ple love.

Image

The build trap stops us build­ing things that peo­ple love because we lose touch with our cus­tomers. We stop test­ing ideas with the mar­ket, and con­fine our­selves in our own bub­ble and just put our heads down and keep on building.

 

We can fall into the build trap in a num­ber of ways, including:

  • pres­sure from stake­hold­ers to always release new fea­tures (Peter Higgs made sim­i­lar crit­i­cisms about mod­ern acad­e­mia where researchers are pres­sured to keep pub­lish­ing papers rather than focus­ing on find­ing the next big idea)
  • arbi­trary dead­lines and fail­ure to respond to change – set­ting dead­lines that are too far out and not being flex­i­ble enough to adapt to change
  • build­ing is work­ing” men­tal­ity – which doesn’t allow time for us to step back and think if we’re build­ing the right things

Build­ing is the easy part.

Fig­ur­ing out what to build is hard.

 

- Melissa Perri

 

Why don’t we take the time to think before we go and build some­thing? Well, the endow­ment effect might has some­thing to do with it – as you invest more and more into an idea and it starts to become part of your iden­tity and it becomes hard for you to let go.

In behav­ioral eco­nom­ics, the endow­ment effect (also known as divesti­ture aver­sion) is the hypoth­e­sis that peo­ple ascribe more value to things merely because they own them. This is illus­trated by the obser­va­tion that peo­ple will tend to pay more to retain some­thing they own than to obtain some­thing owned by some­one else—even when there is no cause for attach­ment, or even if the item was only obtained min­utes ago.

 

One of the most impor­tant respon­si­bil­ity of a prod­uct man­ager is to say NO to ideas, until we’re able to back it up with tests that prove an idea can work, and I think the same goes to developers.

So how do you become the Bad Idea Ter­mi­na­tor, i.e. the per­son that goes and destroys all the bad ideas so we can focus on the good ones? We can start by iden­ti­fy­ing some com­mon mis­takes we make.

 

Mis­take 1 : don’t rec­og­nize bias

Prod­uct ideas suf­fer from sev­eral types of ideas:

  • Causal­ity – we attribute mean­ing and why things hap­pen to the wrong cause

For exam­ple,

We built a mobile app before and it was suc­cess­ful, let’s do another mobile app.

Every­one has a mobile app, so we need one too.

we need to rec­og­nize the dif­fer­ences between cus­tomers and busi­nesses, what worked under one set of cir­cum­stances is not guar­an­teed to work under another.

  • Curse of Knowl­edge – as experts we can­not put our­selves in the shoes of some­one who doesn’t know as much

You should be doing user research and user test­ing, bring your ideas to the cus­tomers and see if that’s what they really want.

  • Anchor­ing – we focus on insignif­i­cant data because it’s the first data we see

When­ever some­one says some­thing like

All my cus­tomers are ask­ing for this!

you should always ask for data, and make peo­ple prove what they’re say­ing is accurate.

 

Mis­take 2 : solu­tions with no problems

When peo­ple sug­gest new ideas, most of the time they come to the table with solu­tions. Instead, we need to start with the WHY, and focus on the prob­lem that we’re try­ing to solve.

On the topic of start­ing with the why, I also find Simon Sinek’s TED talk inspi­ra­tional, and he also has a book on the same topic.

image

There are always mul­ti­ple ways to solve the same prob­lem, and only by focus­ing on the prob­lem would we be able to decide which solu­tion is the best. Unfor­tu­nately, your idea is not always the best idea, and we should be con­scious of the Not Invented Here syn­drome and our propen­sity to fall under its influ­ence (even if only at a sub­con­scious level).

After we fig­ure out the prob­lem we still need to align it with our busi­ness goals, and decide if it’s a prob­lem we can solve and want to solve.

 

Mis­take 3 : build­ing with­out testing

When we get stuck in the build trap we don’t tend to test our assump­tion, as we tend to com­mit to one solu­tion too early. Instead, we should solicit many solu­tions at first, and get peo­ple off the fix­a­tion on the one idea.

We also tend to invest too much into the one idea and then have trou­ble let­ting go (endow­ment effect again).

Instead, we should pick out a few ideas that are the most viable and then test them to find the ideas that:

  • have the most pos­i­tive cus­tomer feed­back, and
  • require the small­est investment

using small, data-driven exper­i­ments. Tech­niques such as A/B test­ing falls right into this (but remem­ber though, A/B test­ing doesn’t tell the whole story, you prob­a­bly also want A/A test­ing to act as blind test group). It could also be as sim­ple as talk­ing to a few cus­tomers to get their feedbacks.

There are 3 key exper­i­ments you should run:

  1. do the cus­tomers have this problem?
  2. are they inter­ested in solu­tion ideas?
  3. are they inter­ested in our solution?

 

Mis­take 4 : no suc­cess metrics

Another com­mon mis­take is to not set suc­cess met­rics when we go and do exper­i­ments, and we also don’t set suc­cess met­rics when build­ing new features.

Instead, we should set goals early. Doing so early is impor­tant because if we set up goals in hind­sight then we’ll just change the goals to make the fea­ture look good…

We should be ask­ing ques­tions such as

How much value do we need to cap­ture to make this fea­ture worth building?

We also need to accept that lean­ing on its own is also a form of suc­cess.

 

The risk of con­tin­u­ing with a bad idea is really great, so the ear­lier we can kill of these bad ideas the lower our risk will be.

And the fast you kill the bad ideas, the more time you will have to devote to the good ones.

Fail fast, so you can suc­ceed faster.

 

- Melissa Perri

and finally,an oblig­a­tory pic­ture of ter­mi­na­tor of course!

Image

 

I really enjoyed Melissa’s talk, and although I’m a devel­oper, I believe every­one inside an orga­ni­za­tion has the respon­si­bil­ity to ask ques­tions and help push the orga­ni­za­tion towards build­ing bet­ter prod­ucts that actu­ally match its customer’s needs.

 

Hav­ing read a cou­ple of Dan Ariely’s books in the past year, I find they pro­vide a very insight­ful back­drop on many of the human irra­tional­i­ties that underlies/causes us to make the com­mon mis­takes that Melissa has iden­ti­fied in her talk.

image   image

 

Links

Slides for the talk

Simon Sinek – Start with Why TED talk

Start with Why : how great lead­ers inspire every­one to take action

Pre­dictably Irra­tional : The Hid­den Forces That Shape Our Decisions

The Upside of Irrationality

This is why you need Composition over Inheritance

composition-over-inheritance_2.png

I was attempt­ing to make some changes to some fairly old code in our code­base (some­thing I prob­a­bly wrote myself…) which hasn’t been touched on for a while.

Nat­u­rally, my first step is to under­stand what the code does, so I started by look­ing at the class where I need to make my changes.

composition over inheritance_1

Per­haps unsur­pris­ingly, I couldn’t fig­ure out how it works.. The class con­tains only a hand­ful of over­ride meth­ods, but I have no idea how they fit together.

So I started dig­ging deeper through sev­eral lay­ers of abstract classes, each fill­ing in parts of the puz­zle, until I reached the base of this class hierarchy.

By this point, I’m star­ing at a con­trol flow full of strate­gi­cally placed gaps. Going back-and-forth along the class hier­ar­chy sev­eral times, I ended up with a vague sense of how the var­i­ous pieces of logic scat­tered across the hier­ar­chy fit together.

composition over inheritance_2

What’s more, where we needed to devi­ate from the con­trol flow dic­tated by the base class, we have had to rein­vent a brand new con­trol flow mid-hierarchy, mak­ing it even harder for me to under­stand what’s going on.

 

This is not where I want to be… I want to be able to rea­son about my code eas­ily, and with con­fi­dence.

I sus­pect a great many of you have expe­ri­enced sim­i­lar pains in the past, but how do you go about apply­ing Com­po­si­tion over Inher­i­tance?

 

Wikipedia’s def­i­n­i­tion and exam­ple of Com­po­si­tion over Inher­i­tance focuses only on domain mod­el­ling, and I’m gen­er­ally not a fan of con­clu­sions such as:

To favor com­po­si­tion over inher­i­tance is a design prin­ci­ple that gives the design higher flex­i­bil­ity, giv­ing business-domain classes and more sta­ble busi­ness domain in the long term.

In other words, HAS-A can be bet­ter than an IS-A relationship.

What does this even mean?!? Are you able to back up these claims of “high flex­i­bil­ity” and “more sta­ble busi­ness domain in the long term” with empir­i­cal evidence?

 

In my view, the real ben­e­fit of Com­po­si­tion over Inher­i­tance is that it encour­ages bet­ter prob­lem decom­po­si­tion – if you don’t break up the prob­lem into smaller pieces (that are eas­ier to tackle) first you have noth­ing to com­pose with later on. Scott Wlaschin’s rail­way ori­ented pro­gram­ming approach is an excel­lent exam­ple of how to apply com­po­si­tion in a prac­ti­cal and ele­gant way.

image

 

In his keynote The Old New Old New Things at Build­Stuff 14, Greg Young described prob­lem decom­po­si­tion as the biggest prob­lem we have in the soft­ware indus­try, because we’re just so bad at it…

And the chal­lenge of prob­lem decom­po­si­tion is not lim­ited to code orga­ni­za­tion. Microser­vices are all the rage right now, and the move from mono­lithic archi­tec­tures to microser­vices is another exam­ple of prob­lem decom­po­si­tion, albeit one that hap­pens at a higher level.

 

So I’ll be tak­ing knife to the afore­men­tioned class hier­ar­chy and replac­ing them with small, com­pos­able units, using a lan­guage that is a great fit for the job – F#!

Will you fol­low my lead?

 

Links

Rail­way ori­ented programming

Is your pro­gram­ming lan­guage unreasonable

Ser­vice archi­tec­tures at scale, lessons from Google and Ebay

Mar­tin Fowler on Microservices

Greg Young – The Old New Old New Things

QCon London 2015–Takeaways from “Scaling Uber’s realtime market platform”

uber_qconlondon15

On day three of QCon Lon­don, we were treated to some really insight­ful sto­ries from the likes of Google, Atlas and Spo­tify. And for the first time in a while Uber is talk­ing pub­li­cally about what they’ve been up to.

 

The chal­lenge for Uber’s plat­form is that both sup­ply (Uber dri­vers) and demand (rid­ers) are dynamic and match­ing them effi­ciently in real-time is not easy.

Uber’s ser­vices are writ­ten in a mix­ture of Node.js, Python, Java and Go, whilst a whole mix of data­bases are used – Post­greSQL, Redis, MySQL and Riak.

 

From a high level, they have a num­ber of back­end components:

image

They recently rewrote the Dis­patch sys­tem despite Joel Spol­sky advis­ing against com­plete rewrites. To that, Matt Ran­ney said there are a num­ber of built-in assump­tions in the cur­rent dis­patch sys­tem that is so deep-rooted that a rev­o­lu­tion­ary step is more effi­cient and productive:

  • assumes 1 rider per vehi­cle, hard to sup­port vehi­cle pooling
  • the idea of mov­ing peo­ple is baked into domain and code, mak­ing it hard to move into new mar­kets (Matt didn’t elab­o­rate, but I assume trans­porta­tion of goods might be one such market)
  • shard­ing by city, which is not a sus­tain­able approach as Uber moves into more and more cities
  • mul­ti­ple points of fail­ure that can bring every­thing down

The dis­patch sys­tem was hard to fix incre­men­tally, and since every­thing runs as a ser­vice, it was fea­si­ble to replace the exist­ing sys­tem outright.

 

The new dis­patch sys­tem looks like this:

image

where DISCO stands for DIS­patCh Opti­miza­tion service.

 

For geo-indexing, the dis­patch ser­vice needs to know not only the phys­i­cal dis­tance between sup­ply and demand, but also ETA based on his­tor­i­cal travel data. The ETA cal­cu­la­tion also needs to han­dle a num­ber of spe­cial cases, includ­ing air­ports, where demands need to be queued (i.e. first come first served) to pro­vide a fair ser­vice to every­one wait­ing at an airport.

The old sys­tem can only track avail­able sup­plies (i.e. cars with no rid­ers), which means there are missed opti­miza­tion oppor­tu­ni­ties such as the following:

image

where the demand (D1) can be met by an in-flight sup­ply (S2) ear­lier than an avail­able sup­ply (S1).

DISCO is able to con­sider sup­plies that are cur­rently in-flight and project their route into the future and take that into the match­ing process, and sup­ports vehi­cle pool­ing (if both D1 and D2 agrees to share a vehicle):

image

 

Uber breaks up the earth into tiny cells (like in Google Maps) and each is given a unique ID. Using the Google S2 library, you can iden­tify cells that will com­pletely cover a shape you’ve supplied:

image

Uber uses these cell IDs as shard­ing key to update sup­ply, and when DISCO needs to match sup­ply to demand, you can use that infor­ma­tion to find sup­plies that are in the match­ing cells.

A lim­i­ta­tion with this approach is that the cells have fixed size, so one would imag­ine the update activ­i­ties are not well spread out through the key space. It’s nat­ural for sup­ply and demand to be con­cen­trated around  city cen­tres where the night life is – cen­tral Lon­don being a prime example.

Nonethe­less, the goal of the rout­ing is to:

  • reduce wait time for riders
  • reduce extra dri­ving for drivers
  • lower over­all ETAs

 

In order to scale their ser­vices, Uber went with an approach of build­ing state­ful ser­vices using Node.js. In addi­tion, they also intro­duced a cus­tom RPC pro­to­col called ring­pop, which is based on the SWIM paper. Ring­pop also runs on its own TChan­nel mul­ti­plex­ing and fram­ing protocol.

The goal of these projects is to provide:

  • per­for­mance across dif­fer­ent languages
  • high per­for­mance request forwarding
  • proper pipelin­ing
  • sup­port for check­sums and tracing
  • encap­su­la­tion

 

On a high-level, nodes in a clus­ter is able to han­dle any request, and if the data is not avail­able on the node then the request is for­warded to the cor­rect node.

image

This essen­tially deals with the need for man­ag­ing con­sis­tent hash­ing on the client.

 

For Uber, avail­abil­ity is of para­mount impor­tance, as the cost of switch­ing to com­peti­tor is low. So they decided to:

  • make every­thing retryable, which means mak­ing every oper­a­tion idem­po­tent (some­thing which I sus­pect can be chal­leng­ing in practice)
  • make every­thing kil­l­able (chaos mon­key style), ring­pop detects failed nodes and remove them from the cluster
  • crash only, no com­pli­cated grace­ful shutdowns
  • break things up into small pieces

which in turn required some cul­tural changes:

  • no pairs (I think he was talk­ing about read-replica setups where there’s a poten­tially com­pli­cated fallover process)
  • kill every­thing, even databases

 

Since ser­vice talk to each other via load bal­ancers, so you will need to be able to kill load bal­ancers too, so instead load bal­ancer logic is put in the ser­vice client (sim­i­lar to Net­flix Rib­bon from what I gath­ered). I didn’t buy Matt’s ratio­nale here since it’s pos­si­ble to make load bal­ancers highly avail­able too, but then he also men­tions the abil­ity to do smarter rout­ing – choos­ing data cen­tre with bet­ter latency in a glob­ally deployed infra­struc­ture for exam­ple – which makes more sense.

 

Matt then went on to talk about some of the chal­lenges with large fanout ser­vices, and in par­tic­u­lar, the chal­lenge with get­ting pre­dictable latency when a large num­ber of ser­vices are involved.

image

He also ref­er­enced Google fel­low Jeff Dean’s paper Achiev­ing Rapid Response Times in Large Online Ser­vices which is a great read, slide 39–70 describes the approach Uber has adopted.

image

In the exam­ple above, the fol­low­ing happened:

  1. ser­vice A sends req 1 to ser­vice B (1), inform­ing it that the request will also be sent to ser­vice B (2)
  2. 5ms later, ser­vice A indeed sends the same request to ser­vice B (2), which goes into its back­log, ser­vice B (2) also finds out that ser­vice B (1) also got the same request
  3. mean­while, ser­vice B (1) starts to process the request, sends a sig­nal to ser­vice B (2) to can­cel req 1 from its backlog
  4. ser­vice B (1) com­pletes the request and replies to ser­vice A

If ser­vice B (1) was under load and couldn’t process the request fast enough then ser­vice B (2) would have processed the request and replied to ser­vice A, unless of course ser­vice B (2) is also under load.

In case you’re wor­ried about the extra requests that would need to be processed with this approach, Jeff Dean paper (above) has the fol­low­ing results to show:

image

A more naive approach would be to always send the request to both ser­vice B (1) and ser­vice B (2) and just ignore the slower response. Based on a pre­vi­ous talk I watch this is (at least was) what Net­flix does.

 

Finally, Matt touched on how Uber deals with dat­a­cen­tre out­ages. Their approach is quite sim­ple and effective:

image

In this exam­ple, when the mobile app sends a loca­tion update, the ser­vice will respond with an encrypted state digest. When dat­a­cen­tre 1 fails:

  1. app will send the loca­tion updates to dat­a­cen­tre 2 instead
  2. since dat­a­cen­tre 2 doesn’t have the user’s state, so it requests the last state digest the app has received
  3. the app then sends the encrypted state digest in dat­a­cen­tre 2
  4. dat­a­cen­tre 2 decrypts the digest and ini­tial­ize the user state
  5. now the app can con­verse with data cen­tre 2 normally

 

Links

Slides for the talk

Ring­pop project page

TChan­nel project page

SWIM : Scal­able Weakly-consistent Infection-style process group Mem­ber­ship protocol

Jeff Dean – Achiev­ing rapid response times in large online services