This is my list of take­aways from a very good talk by Theo Schloss­na­gle on things you should be think­ing about when design­ing a sys­tem for large scale.

I hope you enjoy the talk as much as I did, it cer­tainly leaves plenty of food for thought and per­son­ally it is encour­ag­ing to see that many of the things we are doing/thinking at GameSys are echoed by experts in the field!

 

Dis­claimer: this is by no means a com­pre­hen­sive cov­er­age of every­thing that was cov­ered in the talk, just the things that stood out for me, if there are top­ics miss­ing in the notes below that you feel is worth­while for oth­ers to see please feel free to get in touch with me.

 

Design vs Implementation

When we think about design­ing a sys­tem and trans­lat­ing it to code we tend to think of the code as the imple­men­ta­tion, which is weird when you really think about it, because unlike any other imple­men­ta­tions of a design in the real world the code has no wear and tear and no struc­tural stress factors.

If you design a bridge and build a bridge, years later the bridge does not work the same way due to wear and tear, because the bridge is a thing, it’s not the design but an implementation.

 

Our design for an appli­ca­tion is actu­ally the code, the imple­men­ta­tion of our code is the exe­cu­tion on the CPU, hence we don’t actu­ally do the imple­men­ta­tion of our design, we just let it happen.

 

Com­po­nen­ti­za­tion of a sys­tem allows us to be right once in a while and write pro­grams that don’t have bugs because they’re really small programs.

 

Archi­tec­ture vs Implementation

Archi­tec­ture is a set of com­plex and care­fully designed struc­ture of some­thing, in com­put­ing, it’s the orga­ni­za­tion of a com­puter or a computer-based systems.

  • Archi­tec­ture is with­out spec­i­fi­ca­tion of the ven­dor, make model of components.
  • Imple­men­ta­tion is the adap­ta­tion of an archi­tec­ture to embrace avail­able tech­nolo­gies (e.g. spe­cific lan­guages and frame­works or products).
  • They’re intrin­si­cally tied and there are lots argu­ments about which is which.

 

Archi­tec­ture

An archi­tec­ture is all encompassing.

  • space, power, cooling
  • servers, switches, routers
  • load bal­ancers, firewalls
  • data­bases, non-database storage
  • dynamic appli­ca­tions
  • the archi­tec­ture you export to the user (javascript, etc.)

 

If you have 50 mil­lion users, then you have 50 mil­lion nodes and all of them are your prob­lem because you’re send­ing them pre­sen­ta­tion as well as func­tion (code to execute).

 

No all peo­ple do all things, hence why not all peo­ple think of all those things.

 

How­ever:

  • lack of aware­ness of the other dis­ci­plines is bad
  • leads to iso­lated decisions
  • which leads to unrea­son­able require­ments elsewhere
  • which lead to over engi­neered products
  • stu­pid decisions
  • cat­a­strophic failures

 

Run­ning oper­a­tions is seri­ous stuff, it requires knowl­edge, tools, expe­ri­ence and discipline.

 

Good judg­ment comes from expe­ri­ence. Expe­ri­ence comes from bad judgment.”  

     — Proverb

 

Judge peo­ple on the poise and integrity with which they reme­di­ate their failures.” 

     — Theo Schlossnagle

 

Rule #1 — Every­thing must always be in ver­sion con­trol (includ­ing configurations)

Rule #2 — If it’s not mon­i­tored it’s not in pro­duc­tion (and documented)

 

Most sys­tem design­ers don’t under­stand maths, a typ­i­cal box today is 8 core 2.8GHz = 22.4 bil­lion instruc­tions per sec­ond, and given that web apps are CPU bound if you’re doing 25 request/s on that 8 core box you’re doing things wrong.

 

Here are some sim­ple steps to build a model for your system:

  • Draw it out
  • Take mea­sure­ments and walk through the model to ratio­nal­ize it
  • Map actions to consequences:
    • a user sign = 4 async DB Inserts, 1 AMQP durable, per­sis­tent mes­sage, 1 async DB read
  • Sim­u­late traf­fic in dev envi­ron­ment and ratio­nal­ize your model

 

There will always be empir­i­cal vari­ance from your model, and being able to ratio­nal­ize the model and explain the vari­ance adds a lot of value because it leads to ask­ing impor­tant ques­tions.

 

Rule #3 — Always ratio­nal­ize your inputs and out­puts (ques­tion­ing the real­i­ties and ratio­nal­iz­ing them)

 

Ser­vice decou­pling in com­plex sys­tems give:

  • Sim­pli­fied mod­el­ling and capac­ity planning
  • Slight inef­fi­cien­cies
  • pro­motes lower contention
  • Requires design of sys­tems with less coherency requirements
  • Each iso­lated ser­vice is sim­pler and safer
  • SCALES!

 

Opti­miza­tion

We should for­get about small effi­cien­cies, say about 97% of the time:

     pre­ma­ture opti­miza­tion is the root of all evil.

Yet we should not pass up our oppor­tu­ni­ties in that crit­i­cal 3%.

A good pro­gram­mer will not be lulled into com­pla­cency by such rea­son­ing, he will be wise to look care­fully at the crit­i­cal code; but only after that code has been identified.”

     — Don­ald Knuth

Know­ing when opti­miza­tion is pre­ma­ture defines the dif­fer­ence between the mas­ter engi­neer and the apprentice.”

     — Theo Schlossnagle

 

Opti­miza­tion comes down to a sim­ple concept:

don’t do work you don’t have to”

It can take the form of:

  • Com­pu­ta­tional reuse
  • Caching in a more gen­eral sense
  • Avoid the prob­lem, and do no work at all

 

Opti­miza­tion in dynamic con­tent sim­ply means:

  • Don’t pay to gen­er­ate the same con­tent twice (use caching)
  • Only gen­er­ate con­tent when things change
  • Break the sys­tem into com­po­nents so that you can iso­late the costs of things that change rapidly from those that change infrequently

 

There is a sim­ple truth:

     “Your con­tent isn’t as dynamic as you think it is”

  • Javascript, CSS and images are only ref­er­en­tially linked
  • They should all be con­sol­i­dated and optimized
  • They should be pub­licly cacheable and expire 10 years from now
  • Use cache bust­ing when you need to inval­i­date the cache
  • Use the cookie for per user info!

 

Ask­ing hard ques­tions of data­base can be “expen­sive”, instead you have two options:

  • cache the results — best when you can’t afford to be accurate
  • mate­ri­al­ize a view on the changes — best when you need to be accurate

 

Rule #4 — Never solve a prob­lem that you can oth­er­wise avoid

Rule #5 — Do not repeat work

 

Data­bases

  • rule 1 : shard your database
  • rule 2 : shoot yourself

Hor­i­zon­tally scal­ing your data­bases via sharding/federating requires that you make con­ces­sions that should make you cry.

 

Data­bases (other than MySQL) scale ver­ti­cally to a greater degree than most peo­ple admit. But at some point, when you’ve sat­u­rated your ver­ti­cal scal­a­bil­ity and you can’t scale up any­more, you cry really hard and you scale out.

 

The inter­est­ing part about scal­ing out (via Riak, etc.) is that they’re not shard­ing, instead they have dis­trib­uted data man­age­ment. Shard­ing is when you do it your­self, the dis­trib­uted data man­age­ment tech­niques used by these sys­tems are backed and proven by research papers, tested, stressed, and they work.

 

Many times rela­tional con­straints are not needed on data. If this is the case, a tra­di­tional rela­tional data­base is unnecessary.

There are cool tech­nolo­gies out there to do this:

  • files”
  • NoSQL
  • cook­ies

 

Clever use of cook­ies can help you scale out much eas­ier as data is stored on the customer’s com­puter, far away from your infra­struc­ture and you no longer have to find solu­tions to deal with a prob­lem you no longer have! There are many lim­i­ta­tions on what you can do with cook­ies, but for the right uses cases you should def­i­nitely con­sider them.

 

Don’t always use the same solu­tion you’ve used before, always evaluate.

 

Non-ACID data­bases can be eas­ier to scale.

 

Ver­ti­cal scal­ing is achieved via two mechanism:

  • doing only what is absolutely nec­es­sary in the database
  • run­ning a good data­base that can scale well vertically

If you really need to scale hor­i­zon­tally, make sure you under­stand the ques­tions you intend to ask (of your data).

Make sure that you par­ti­tion in a fash­ion that doesn’t require more than a sin­gle shard to answer OLTP-style ques­tions. If that’s not pos­si­ble, con­sider data dupli­ca­tion (store the same data mul­ti­ple times, once for each question).

 

There are some alter­na­tives to tra­di­tional RDBMS sys­tems, with­out an imposed rela­tional model federating/sharding is much eas­ier to bake in.

By relax­ing con­sis­tency require­ments, one can increase avail­abil­ity by adopt­ing a par­a­digm of even­tual con­sis­tency.

  • Mon­goDB
  • Cas­san­dra
  • Volde­mort
  • Redis
  • Riak

BUT, NoSQL sys­tems aren’t a cure-all, a lot of data has rela­tion­ships that are important.

 

Ref­er­en­tial integrity is quite impor­tant in many sit­u­a­tions, and a lot of datasets do not need to scale past a sin­gle instance.

Also, not every com­po­nent of the archi­tec­ture needs to scale past the lim­its of ver­ti­cal scaling.

 

If you can seg­re­gate your com­po­nents, you can adhere to a right tool for the job paradigm.

Use SQL where it is the best tool for the job and use dis­trib­uted key-value stores and doc­u­ment data­bases in sit­u­a­tions where they shine.

 

NoSQL does not need no DBAs”

Even when you’re oper­at­ing in an envi­ron­ment where there are no peo­ple with DBA job titles, there’s still a job being done to con­trol and con­straint the data model. When you put some data into a data­base, that’s your data model and it needs to be con­trolled, if not by DBA then by appli­ca­tion devel­op­ers, oth­er­wise all hell will break loose!

Regard­less of whether you’re using schema or semi-schema data, you still have a job to keep the data back­ward and for­ward com­pat­i­ble with your soft­ware. DBA have a lot of expe­ri­ence doing that! So it’s worth learn­ing from their expe­ri­ences even if we’re not going to use their data­bases anymore ;-)

 

If you break the prob­lems down into small pieces then you can start using dif­fer­ent data­bases for dif­fer­ent prob­lems. Deter­mine how big the prob­lems is and can grow and use the appro­pri­ate data­base to han­dle it, so that you fit the solu­tion to the prob­lem.

 

Avoid:

  • shiny is good”
  • over engi­neer­ing”

Embrace:

  • K.I.S.S”
  • good is good”

 

NoSQL real­i­ties

NoSQL sys­tems are built to han­dle sys­tem failures.

 

NoSQL sys­tem per­for­mance num­bers and sta­bil­ity reviews are never derived dur­ing fail­ure conditions.

 

NoSQL sys­tems tend to behave very badly dur­ing fail­ure sce­nar­ios, because their oper­a­tors assume unal­tered operations.

 

Think about the per­for­mance degra­da­tion of doing a filesys­tem backup of a tra­di­tional RDBMS dur­ing peak usage.

In fail­ure sce­nario of NoSQL, sim­i­lar such taxes exist, but:

  • peo­ple tend to oper­ate them under heavy load with no headroom
  • the head­room for node recov­ery and degraded oper­a­tion are quite large

The per­for­mance met­rics NEEDS to reflect fail­ure recov­ery sce­nar­ios, because that’s when you’re going see your peak load as per Murphy’s law!

 

Rule #6 — Appro­pri­ate­ness is both com­pre­hen­sive and objective

 

When eval­u­at­ing the per­sis­tence solu­tion for your data, match the require­ments with the solution’s fea­tures and characteristics.

 

Net­work­ing

The net­work is part of the architecture.

So often for­got­ten by the data­base engi­neers and the appli­ca­tion coders and the front-end devel­op­ers and the designers.

 

The net­work doesn’t work with bytes, it doesn’t send bytes, every­thing revolves around packets.

 

Many appli­ca­tions today are so poorly designed that net­work issues never become scal­a­bil­ity con­cerns, this is for the appli­ca­tion archi­tec­tures that have high traf­fic rates.

 

How do you push 10/20 GigE? Buy­ing a really expen­sive load bal­ancer is one option, but most com­pa­nies don’t, because there are cheaper ways to do it — use rout­ing.

Rout­ing sup­ports extremely naive load bal­anc­ing, you can run rout­ing pro­to­col on your edge machines to route traf­fic to a set of sta­tic addresses. (e.g. CDNs)

 

This adds fault-tolerance and dis­trib­utes net­work load.

 

Rule #7 — Solu­tions should be as close to the cus­tomer as possible

 

Ser­vice Decoupling

One of the most fun­da­men­tal tech­niques for build­ing scal­able systems.

 

Asyn­chrony

  • Break down the user trans­ac­tion into parts.
  • Iso­late those that could occur asynchronously
  • Queue the infor­ma­tion needed to com­plete the task
  • Process the queues “behind the scenes”

 

If I don’t want to do some­thing now, I must tell some­one to do it later. This is “mes­sag­ing”.

There are a lot of solutions:

  • JMS (Java mes­sage service)
  • Spread (extended vir­tual syn­chrony mes­sag­ing bus)
  • AMQP (advanced mes­sage queue protocol)
  • ZeroMQ (“Fast” messaging)

 

There are some fun­da­men­tal prob­lems in mes­sag­ing, such as ‘how do you know your mes­sage was received?’, many peo­ple are send­ing lots of mes­sages to a mes­sag­ing ser­vice and just assumed that they got there.

Again, you need to under­stand the fail­ure cases, and under­stand the con­se­quences of these fail­ure cases. In a lot of cases, so long you can limit the like­li­hood of that hap­pen­ing you might be able to say that you don’t really care about the occa­sion­ally dropped messages.

 

You can adopt this men­tal­ity for the 99.99% mes­sages that aren’t really impor­tant, but you need to make sure the 0.01% of mes­sages that are are guar­an­teed deliv­ery and works reli­ably and fully persistent.

Again, you iden­tify the com­po­nents where these oper­a­tional con­straints are really impor­tant and solve the prob­lems only in those places.

(most) Asyn­chro­nously (and, even more so, dis­trib­uted) sys­tems are:

  • Com­plex
  • Non-sequential
  • Self-inconsistent
  • Under-engineered
  • Under-instrumented
  • Unnec­es­sary
  • Scale very very well

Yes you need dis­trib­uted sys­tems, but all things in moderation.

 

Rule #8 — Com­plex­ity is the devil

Rule #9 — Deal with the devil only when necessary

 

Don’t be an idiot

Most acute scal­a­bil­ity dis­as­ters are due to idiots.

 

Scal­ing is hard, per­for­mance is easier.

Per­for­mance is very impor­tant, if you can reduce the num­ber of servers you run, then you reduce the oper­a­tional com­plex­ity and be able to get your head around net­work­ing because:

     Net­work­ing is HARD, it’s really really HARD to get tens of thou­sands of servers to talk to each other.

 

Extremely high-performance sys­tems tend to be eas­ier to scale because they don’t have to scale as much.

 

Rule #10 — Don’t be a **** idiot

 

Idiocy is really bad, and contagious.

Share

Leave a Reply