In gen­eral, when you see async void in your code it’s bad news, because:

  1. you can’t wait for its com­ple­tion (as men­tioned in this post already)
  2. any unhan­dled excep­tions will ter­mi­nate your process (ouch!)

 

Sup­pose you have a timer event that fires every once in a while and you want to do some asyn­chro­nous pro­cess­ing inside the han­dler, so you pro­ceed to write some­thing along the lines of:

The fact that your timer event excepted might hurt, that it took your whole process with it is likely to hurt an awful lot more!

Well, what are your options? If you change the han­dler to return Task instead the code won’t com­pile because ElapsedE­ven­tHandler del­e­gate type spec­i­fies a void return type. Annoyed

Two options springs to mind here.

First, you can always just exe­cute the whole block of code syn­chro­nously instead.

This is not ideal because it’s going to block the thread whilst it’s wait­ing for the asyn­chro­nous oper­a­tion to come back, pre­vent­ing the thread from being used to do other use­ful work in the mean time.

If you wish to per­sist with using async-await and just wish to swal­low any excep­tions then you can con­sider option two:

In this case, we’ve sim­ply used a con­tin­u­a­tion (which fires regarded whether the pre­ced­ing task had faulted) to swal­low any excep­tion that had been thrown by the asyn­chro­nous operations.

 

Hope this helps, and remem­ber, when­ever you see async void in your code it should be trig­ger­ing off your spi­der senses!

Share

With the offi­cial release of .Net 4.5 and Visual Stu­dio 2012, I sus­pect many .Net devel­op­ers will be rush­ing to rewrite their data access or net­work lay­ers (amongst many many other things!) to take advan­tage of the new async-await (see the excel­lent 101 exam­ples here) lan­guage fea­ture in C#, which means you’ll likely be work­ing with the Task and Task<T> type an awful lot.

If you have F# code that needs to interop with C# that returns or awaits some task types then you’ve prob­a­bly already come across the Async.StartAsTask<T> and Async.AwaitTask<T> meth­ods for con­vert­ing between F#’s Async<T> and Task<T> types. Curi­ously, there are no equiv­a­lent meth­ods on the Async class for con­vert­ing between Async<unit> and Task types.

So, to fill in the gaps our­selves, here are two sim­ple func­tions to do just that:

 

Enjoy!

Share

In Erlang, we have the Super­vi­sor behav­iour which makes it very easy to pro­vide the means to mon­i­tor and restart a whole net­work of work­ers and other super­vi­sors based on some con­fig­ured strategy.

The Mail­box­Proces­sor (aka agents) in F# doesn’t come with the same higher-level abstrac­tions (such as Erlang’s gen_server, gen_fsm behav­iours) by default, but it’s very easy to mimic some of their capa­bil­i­ties by build­ing on top of the stan­dard F# agents.

Luca Bolognese’s light­weight LAgent frame­work is a per­fect exam­ple and even pro­vides inter­est­ing pos­si­bil­i­ties such as hot swap­ping of code (although still some way off of what’s pos­si­ble in Erlang in this regard).

Sim­i­larly, the fol­low­ing snip­pet shows how you can start an agent with super­vi­sion so that any escaped excep­tions (which will crash the agent) are trapped and restarts the agent:

Share

This is my list of take­aways from a very good talk by Theo Schloss­na­gle on things you should be think­ing about when design­ing a sys­tem for large scale.

I hope you enjoy the talk as much as I did, it cer­tainly leaves plenty of food for thought and per­son­ally it is encour­ag­ing to see that many of the things we are doing/thinking at GameSys are echoed by experts in the field!

 

Dis­claimer: this is by no means a com­pre­hen­sive cov­er­age of every­thing that was cov­ered in the talk, just the things that stood out for me, if there are top­ics miss­ing in the notes below that you feel is worth­while for oth­ers to see please feel free to get in touch with me.

 

Design vs Implementation

When we think about design­ing a sys­tem and trans­lat­ing it to code we tend to think of the code as the imple­men­ta­tion, which is weird when you really think about it, because unlike any other imple­men­ta­tions of a design in the real world the code has no wear and tear and no struc­tural stress factors.

If you design a bridge and build a bridge, years later the bridge does not work the same way due to wear and tear, because the bridge is a thing, it’s not the design but an implementation.

 

Our design for an appli­ca­tion is actu­ally the code, the imple­men­ta­tion of our code is the exe­cu­tion on the CPU, hence we don’t actu­ally do the imple­men­ta­tion of our design, we just let it happen.

 

Com­po­nen­ti­za­tion of a sys­tem allows us to be right once in a while and write pro­grams that don’t have bugs because they’re really small programs.

 

Archi­tec­ture vs Implementation

Archi­tec­ture is a set of com­plex and care­fully designed struc­ture of some­thing, in com­put­ing, it’s the orga­ni­za­tion of a com­puter or a computer-based systems.

  • Archi­tec­ture is with­out spec­i­fi­ca­tion of the ven­dor, make model of components.
  • Imple­men­ta­tion is the adap­ta­tion of an archi­tec­ture to embrace avail­able tech­nolo­gies (e.g. spe­cific lan­guages and frame­works or products).
  • They’re intrin­si­cally tied and there are lots argu­ments about which is which.

 

Archi­tec­ture

An archi­tec­ture is all encompassing.

  • space, power, cooling
  • servers, switches, routers
  • load bal­ancers, firewalls
  • data­bases, non-database storage
  • dynamic appli­ca­tions
  • the archi­tec­ture you export to the user (javascript, etc.)

 

If you have 50 mil­lion users, then you have 50 mil­lion nodes and all of them are your prob­lem because you’re send­ing them pre­sen­ta­tion as well as func­tion (code to execute).

 

No all peo­ple do all things, hence why not all peo­ple think of all those things.

 

How­ever:

  • lack of aware­ness of the other dis­ci­plines is bad
  • leads to iso­lated decisions
  • which leads to unrea­son­able require­ments elsewhere
  • which lead to over engi­neered products
  • stu­pid decisions
  • cat­a­strophic failures

 

Run­ning oper­a­tions is seri­ous stuff, it requires knowl­edge, tools, expe­ri­ence and discipline.

 

Good judg­ment comes from expe­ri­ence. Expe­ri­ence comes from bad judgment.”  

     — Proverb

 

Judge peo­ple on the poise and integrity with which they reme­di­ate their failures.” 

     — Theo Schlossnagle

 

Rule #1 — Every­thing must always be in ver­sion con­trol (includ­ing configurations)

Rule #2 — If it’s not mon­i­tored it’s not in pro­duc­tion (and documented)

 

Most sys­tem design­ers don’t under­stand maths, a typ­i­cal box today is 8 core 2.8GHz = 22.4 bil­lion instruc­tions per sec­ond, and given that web apps are CPU bound if you’re doing 25 request/s on that 8 core box you’re doing things wrong.

 

Here are some sim­ple steps to build a model for your system:

  • Draw it out
  • Take mea­sure­ments and walk through the model to ratio­nal­ize it
  • Map actions to consequences:
    • a user sign = 4 async DB Inserts, 1 AMQP durable, per­sis­tent mes­sage, 1 async DB read
  • Sim­u­late traf­fic in dev envi­ron­ment and ratio­nal­ize your model

 

There will always be empir­i­cal vari­ance from your model, and being able to ratio­nal­ize the model and explain the vari­ance adds a lot of value because it leads to ask­ing impor­tant ques­tions.

 

Rule #3 — Always ratio­nal­ize your inputs and out­puts (ques­tion­ing the real­i­ties and ratio­nal­iz­ing them)

 

Ser­vice decou­pling in com­plex sys­tems give:

  • Sim­pli­fied mod­el­ling and capac­ity planning
  • Slight inef­fi­cien­cies
  • pro­motes lower contention
  • Requires design of sys­tems with less coherency requirements
  • Each iso­lated ser­vice is sim­pler and safer
  • SCALES!

 

Opti­miza­tion

We should for­get about small effi­cien­cies, say about 97% of the time:

     pre­ma­ture opti­miza­tion is the root of all evil.

Yet we should not pass up our oppor­tu­ni­ties in that crit­i­cal 3%.

A good pro­gram­mer will not be lulled into com­pla­cency by such rea­son­ing, he will be wise to look care­fully at the crit­i­cal code; but only after that code has been identified.”

     — Don­ald Knuth

Know­ing when opti­miza­tion is pre­ma­ture defines the dif­fer­ence between the mas­ter engi­neer and the apprentice.”

     — Theo Schlossnagle

 

Opti­miza­tion comes down to a sim­ple concept:

don’t do work you don’t have to”

It can take the form of:

  • Com­pu­ta­tional reuse
  • Caching in a more gen­eral sense
  • Avoid the prob­lem, and do no work at all

 

Opti­miza­tion in dynamic con­tent sim­ply means:

  • Don’t pay to gen­er­ate the same con­tent twice (use caching)
  • Only gen­er­ate con­tent when things change
  • Break the sys­tem into com­po­nents so that you can iso­late the costs of things that change rapidly from those that change infrequently

 

There is a sim­ple truth:

     “Your con­tent isn’t as dynamic as you think it is”

  • Javascript, CSS and images are only ref­er­en­tially linked
  • They should all be con­sol­i­dated and optimized
  • They should be pub­licly cacheable and expire 10 years from now
  • Use cache bust­ing when you need to inval­i­date the cache
  • Use the cookie for per user info!

 

Ask­ing hard ques­tions of data­base can be “expen­sive”, instead you have two options:

  • cache the results — best when you can’t afford to be accurate
  • mate­ri­al­ize a view on the changes — best when you need to be accurate

 

Rule #4 — Never solve a prob­lem that you can oth­er­wise avoid

Rule #5 — Do not repeat work

 

Data­bases

  • rule 1 : shard your database
  • rule 2 : shoot yourself

Hor­i­zon­tally scal­ing your data­bases via sharding/federating requires that you make con­ces­sions that should make you cry.

 

Data­bases (other than MySQL) scale ver­ti­cally to a greater degree than most peo­ple admit. But at some point, when you’ve sat­u­rated your ver­ti­cal scal­a­bil­ity and you can’t scale up any­more, you cry really hard and you scale out.

 

The inter­est­ing part about scal­ing out (via Riak, etc.) is that they’re not shard­ing, instead they have dis­trib­uted data man­age­ment. Shard­ing is when you do it your­self, the dis­trib­uted data man­age­ment tech­niques used by these sys­tems are backed and proven by research papers, tested, stressed, and they work.

 

Many times rela­tional con­straints are not needed on data. If this is the case, a tra­di­tional rela­tional data­base is unnecessary.

There are cool tech­nolo­gies out there to do this:

  • files”
  • NoSQL
  • cook­ies

 

Clever use of cook­ies can help you scale out much eas­ier as data is stored on the customer’s com­puter, far away from your infra­struc­ture and you no longer have to find solu­tions to deal with a prob­lem you no longer have! There are many lim­i­ta­tions on what you can do with cook­ies, but for the right uses cases you should def­i­nitely con­sider them.

 

Don’t always use the same solu­tion you’ve used before, always evaluate.

 

Non-ACID data­bases can be eas­ier to scale.

 

Ver­ti­cal scal­ing is achieved via two mechanism:

  • doing only what is absolutely nec­es­sary in the database
  • run­ning a good data­base that can scale well vertically

If you really need to scale hor­i­zon­tally, make sure you under­stand the ques­tions you intend to ask (of your data).

Make sure that you par­ti­tion in a fash­ion that doesn’t require more than a sin­gle shard to answer OLTP-style ques­tions. If that’s not pos­si­ble, con­sider data dupli­ca­tion (store the same data mul­ti­ple times, once for each question).

 

There are some alter­na­tives to tra­di­tional RDBMS sys­tems, with­out an imposed rela­tional model federating/sharding is much eas­ier to bake in.

By relax­ing con­sis­tency require­ments, one can increase avail­abil­ity by adopt­ing a par­a­digm of even­tual con­sis­tency.

  • Mon­goDB
  • Cas­san­dra
  • Volde­mort
  • Redis
  • Riak

BUT, NoSQL sys­tems aren’t a cure-all, a lot of data has rela­tion­ships that are important.

 

Ref­er­en­tial integrity is quite impor­tant in many sit­u­a­tions, and a lot of datasets do not need to scale past a sin­gle instance.

Also, not every com­po­nent of the archi­tec­ture needs to scale past the lim­its of ver­ti­cal scaling.

 

If you can seg­re­gate your com­po­nents, you can adhere to a right tool for the job paradigm.

Use SQL where it is the best tool for the job and use dis­trib­uted key-value stores and doc­u­ment data­bases in sit­u­a­tions where they shine.

 

NoSQL does not need no DBAs”

Even when you’re oper­at­ing in an envi­ron­ment where there are no peo­ple with DBA job titles, there’s still a job being done to con­trol and con­straint the data model. When you put some data into a data­base, that’s your data model and it needs to be con­trolled, if not by DBA then by appli­ca­tion devel­op­ers, oth­er­wise all hell will break loose!

Regard­less of whether you’re using schema or semi-schema data, you still have a job to keep the data back­ward and for­ward com­pat­i­ble with your soft­ware. DBA have a lot of expe­ri­ence doing that! So it’s worth learn­ing from their expe­ri­ences even if we’re not going to use their data­bases anymore ;-)

 

If you break the prob­lems down into small pieces then you can start using dif­fer­ent data­bases for dif­fer­ent prob­lems. Deter­mine how big the prob­lems is and can grow and use the appro­pri­ate data­base to han­dle it, so that you fit the solu­tion to the prob­lem.

 

Avoid:

  • shiny is good”
  • over engi­neer­ing”

Embrace:

  • K.I.S.S”
  • good is good”

 

NoSQL real­i­ties

NoSQL sys­tems are built to han­dle sys­tem failures.

 

NoSQL sys­tem per­for­mance num­bers and sta­bil­ity reviews are never derived dur­ing fail­ure conditions.

 

NoSQL sys­tems tend to behave very badly dur­ing fail­ure sce­nar­ios, because their oper­a­tors assume unal­tered operations.

 

Think about the per­for­mance degra­da­tion of doing a filesys­tem backup of a tra­di­tional RDBMS dur­ing peak usage.

In fail­ure sce­nario of NoSQL, sim­i­lar such taxes exist, but:

  • peo­ple tend to oper­ate them under heavy load with no headroom
  • the head­room for node recov­ery and degraded oper­a­tion are quite large

The per­for­mance met­rics NEEDS to reflect fail­ure recov­ery sce­nar­ios, because that’s when you’re going see your peak load as per Murphy’s law!

 

Rule #6 — Appro­pri­ate­ness is both com­pre­hen­sive and objective

 

When eval­u­at­ing the per­sis­tence solu­tion for your data, match the require­ments with the solution’s fea­tures and characteristics.

 

Net­work­ing

The net­work is part of the architecture.

So often for­got­ten by the data­base engi­neers and the appli­ca­tion coders and the front-end devel­op­ers and the designers.

 

The net­work doesn’t work with bytes, it doesn’t send bytes, every­thing revolves around packets.

 

Many appli­ca­tions today are so poorly designed that net­work issues never become scal­a­bil­ity con­cerns, this is for the appli­ca­tion archi­tec­tures that have high traf­fic rates.

 

How do you push 10/20 GigE? Buy­ing a really expen­sive load bal­ancer is one option, but most com­pa­nies don’t, because there are cheaper ways to do it — use rout­ing.

Rout­ing sup­ports extremely naive load bal­anc­ing, you can run rout­ing pro­to­col on your edge machines to route traf­fic to a set of sta­tic addresses. (e.g. CDNs)

 

This adds fault-tolerance and dis­trib­utes net­work load.

 

Rule #7 — Solu­tions should be as close to the cus­tomer as possible

 

Ser­vice Decoupling

One of the most fun­da­men­tal tech­niques for build­ing scal­able systems.

 

Asyn­chrony

  • Break down the user trans­ac­tion into parts.
  • Iso­late those that could occur asynchronously
  • Queue the infor­ma­tion needed to com­plete the task
  • Process the queues “behind the scenes”

 

If I don’t want to do some­thing now, I must tell some­one to do it later. This is “mes­sag­ing”.

There are a lot of solutions:

  • JMS (Java mes­sage service)
  • Spread (extended vir­tual syn­chrony mes­sag­ing bus)
  • AMQP (advanced mes­sage queue protocol)
  • ZeroMQ (“Fast” messaging)

 

There are some fun­da­men­tal prob­lems in mes­sag­ing, such as ‘how do you know your mes­sage was received?’, many peo­ple are send­ing lots of mes­sages to a mes­sag­ing ser­vice and just assumed that they got there.

Again, you need to under­stand the fail­ure cases, and under­stand the con­se­quences of these fail­ure cases. In a lot of cases, so long you can limit the like­li­hood of that hap­pen­ing you might be able to say that you don’t really care about the occa­sion­ally dropped messages.

 

You can adopt this men­tal­ity for the 99.99% mes­sages that aren’t really impor­tant, but you need to make sure the 0.01% of mes­sages that are are guar­an­teed deliv­ery and works reli­ably and fully persistent.

Again, you iden­tify the com­po­nents where these oper­a­tional con­straints are really impor­tant and solve the prob­lems only in those places.

(most) Asyn­chro­nously (and, even more so, dis­trib­uted) sys­tems are:

  • Com­plex
  • Non-sequential
  • Self-inconsistent
  • Under-engineered
  • Under-instrumented
  • Unnec­es­sary
  • Scale very very well

Yes you need dis­trib­uted sys­tems, but all things in moderation.

 

Rule #8 — Com­plex­ity is the devil

Rule #9 — Deal with the devil only when necessary

 

Don’t be an idiot

Most acute scal­a­bil­ity dis­as­ters are due to idiots.

 

Scal­ing is hard, per­for­mance is easier.

Per­for­mance is very impor­tant, if you can reduce the num­ber of servers you run, then you reduce the oper­a­tional com­plex­ity and be able to get your head around net­work­ing because:

     Net­work­ing is HARD, it’s really really HARD to get tens of thou­sands of servers to talk to each other.

 

Extremely high-performance sys­tems tend to be eas­ier to scale because they don’t have to scale as much.

 

Rule #10 — Don’t be a **** idiot

 

Idiocy is really bad, and contagious.

Share