Takeaways from Theo Schloss’s talk on Scalable Internet Architecture

This is my list of take­aways from a very good talk by Theo Schloss­na­gle on things you should be think­ing about when design­ing a sys­tem for large scale.

I hope you enjoy the talk as much as I did, it cer­tain­ly leaves plen­ty of food for thought and per­son­al­ly it is encour­ag­ing to see that many of the things we are doing/thinking at GameSys are echoed by experts in the field!

 

Dis­claimer: this is by no means a com­pre­hen­sive cov­er­age of every­thing that was cov­ered in the talk, just the things that stood out for me, if there are top­ics miss­ing in the notes below that you feel is worth­while for oth­ers to see please feel free to get in touch with me.

 

Design vs Implementation

When we think about design­ing a sys­tem and trans­lat­ing it to code we tend to think of the code as the imple­men­ta­tion, which is weird when you real­ly think about it, because unlike any oth­er imple­men­ta­tions of a design in the real world the code has no wear and tear and no struc­tur­al stress fac­tors.

If you design a bridge and build a bridge, years lat­er the bridge does not work the same way due to wear and tear, because the bridge is a thing, it’s not the design but an imple­men­ta­tion.

 

Our design for an appli­ca­tion is actu­al­ly the code, the imple­men­ta­tion of our code is the exe­cu­tion on the CPU, hence we don’t actu­al­ly do the imple­men­ta­tion of our design, we just let it hap­pen.

 

Com­po­nen­ti­za­tion of a sys­tem allows us to be right once in a while and write pro­grams that don’t have bugs because they’re real­ly small pro­grams.

 

Architecture vs Implementation

Archi­tec­ture is a set of com­plex and care­ful­ly designed struc­ture of some­thing, in com­put­ing, it’s the orga­ni­za­tion of a com­put­er or a com­put­er-based sys­tems.

  • Archi­tec­ture is with­out spec­i­fi­ca­tion of the ven­dor, make mod­el of com­po­nents.
  • Imple­men­ta­tion is the adap­ta­tion of an archi­tec­ture to embrace avail­able tech­nolo­gies (e.g. spe­cif­ic lan­guages and frame­works or prod­ucts).
  • They’re intrin­si­cal­ly tied and there are lots argu­ments about which is which.

 

Architecture

An archi­tec­ture is all encom­pass­ing.

  • space, pow­er, cool­ing
  • servers, switch­es, routers
  • load bal­ancers, fire­walls
  • data­bas­es, non-data­base stor­age
  • dynam­ic appli­ca­tions
  • the archi­tec­ture you export to the user (javascript, etc.)

 

If you have 50 mil­lion users, then you have 50 mil­lion nodes and all of them are your prob­lem because you’re send­ing them pre­sen­ta­tion as well as func­tion (code to exe­cute).

 

No all peo­ple do all things, hence why not all peo­ple think of all those things.

 

How­ev­er:

  • lack of aware­ness of the oth­er dis­ci­plines is bad
  • leads to iso­lat­ed deci­sions
  • which leads to unrea­son­able require­ments else­where
  • which lead to over engi­neered prod­ucts
  • stu­pid deci­sions
  • cat­a­stroph­ic fail­ures

 

Run­ning oper­a­tions is seri­ous stuff, it requires knowl­edge, tools, expe­ri­ence and dis­ci­pline.

 

Good judg­ment comes from expe­ri­ence. Expe­ri­ence comes from bad judg­ment.”

- Proverb

 

Judge peo­ple on the poise and integri­ty with which they reme­di­ate their fail­ures.”

- Theo Schloss­na­gle

 

Rule #1 — Every­thing must always be in ver­sion con­trol (includ­ing con­fig­u­ra­tions)

Rule #2 — If it’s not mon­i­tored it’s not in pro­duc­tion (and doc­u­ment­ed)

 

Most sys­tem design­ers don’t under­stand maths, a typ­i­cal box today is 8 core 2.8GHz = 22.4 bil­lion instruc­tions per sec­ond, and giv­en that web apps are CPU bound if you’re doing 25 request/s on that 8 core box you’re doing things wrong.

 

Here are some sim­ple steps to build a mod­el for your sys­tem:

  • Draw it out
  • Take mea­sure­ments and walk through the mod­el to ratio­nal­ize it
  • Map actions to con­se­quences:
    • a user sign = 4 async DB Inserts, 1 AMQP durable, per­sis­tent mes­sage, 1 async DB read
  • Sim­u­late traf­fic in dev envi­ron­ment and ratio­nal­ize your mod­el

 

There will always be empir­i­cal vari­ance from your mod­el, and being able to ratio­nal­ize the mod­el and explain the vari­ance adds a lot of val­ue because it leads to ask­ing impor­tant ques­tions.

 

Rule #3 — Always ratio­nal­ize your inputs and out­puts (ques­tion­ing the real­i­ties and ratio­nal­iz­ing them)

 

Ser­vice decou­pling in com­plex sys­tems give:

  • Sim­pli­fied mod­el­ling and capac­i­ty plan­ning
  • Slight inef­fi­cien­cies
  • pro­motes low­er con­tention
  • Requires design of sys­tems with less coheren­cy require­ments
  • Each iso­lat­ed ser­vice is sim­pler and safer
  • SCALES!

 

Optimization

We should for­get about small effi­cien­cies, say about 97% of the time:

pre­ma­ture opti­miza­tion is the root of all evil.

Yet we should not pass up our oppor­tu­ni­ties in that crit­i­cal 3%.

A good pro­gram­mer will not be lulled into com­pla­cen­cy by such rea­son­ing, he will be wise to look care­ful­ly at the crit­i­cal code; but only after that code has been iden­ti­fied.”

- Don­ald Knuth

Know­ing when opti­miza­tion is pre­ma­ture defines the dif­fer­ence between the mas­ter engi­neer and the appren­tice.”

- Theo Schloss­na­gle

 

Opti­miza­tion comes down to a sim­ple con­cept:

don’t do work you don’t have to”

It can take the form of:

  • Com­pu­ta­tion­al reuse
  • Caching in a more gen­er­al sense
  • Avoid the prob­lem, and do no work at all

 

Opti­miza­tion in dynam­ic con­tent sim­ply means:

  • Don’t pay to gen­er­ate the same con­tent twice (use caching)
  • Only gen­er­ate con­tent when things change
  • Break the sys­tem into com­po­nents so that you can iso­late the costs of things that change rapid­ly from those that change infre­quent­ly

 

There is a sim­ple truth:

     “Your con­tent isn’t as dynam­ic as you think it is”

  • Javascript, CSS and images are only ref­er­en­tial­ly linked
  • They should all be con­sol­i­dat­ed and opti­mized
  • They should be pub­licly cacheable and expire 10 years from now
  • Use cache bust­ing when you need to inval­i­date the cache
  • Use the cook­ie for per user info!

 

Ask­ing hard ques­tions of data­base can be “expen­sive”, instead you have two options:

  • cache the results — best when you can’t afford to be accu­rate
  • mate­ri­al­ize a view on the changes — best when you need to be accu­rate

 

Rule #4 — Nev­er solve a prob­lem that you can oth­er­wise avoid

Rule #5 — Do not repeat work

 

Databases

  • rule 1 : shard your data­base
  • rule 2 : shoot your­self

Hor­i­zon­tal­ly scal­ing your data­bas­es via sharding/federating requires that you make con­ces­sions that should make you cry.

 

Data­bas­es (oth­er than MySQL) scale ver­ti­cal­ly to a greater degree than most peo­ple admit. But at some point, when you’ve sat­u­rat­ed your ver­ti­cal scal­a­bil­i­ty and you can’t scale up any­more, you cry real­ly hard and you scale out.

 

The inter­est­ing part about scal­ing out (via Riak, etc.) is that they’re not shard­ing, instead they have dis­trib­uted data man­age­ment. Shard­ing is when you do it your­self, the dis­trib­uted data man­age­ment tech­niques used by these sys­tems are backed and proven by research papers, test­ed, stressed, and they work.

 

Many times rela­tion­al con­straints are not need­ed on data. If this is the case, a tra­di­tion­al rela­tion­al data­base is unnec­es­sary.

There are cool tech­nolo­gies out there to do this:

  • files”
  • NoSQL
  • cook­ies

 

Clever use of cook­ies can help you scale out much eas­i­er as data is stored on the customer’s com­put­er, far away from your infra­struc­ture and you no longer have to find solu­tions to deal with a prob­lem you no longer have! There are many lim­i­ta­tions on what you can do with cook­ies, but for the right uses cas­es you should def­i­nite­ly con­sid­er them.

 

Don’t always use the same solu­tion you’ve used before, always eval­u­ate.

 

Non-ACID data­bas­es can be eas­i­er to scale.

 

Ver­ti­cal scal­ing is achieved via two mech­a­nism:

  • doing only what is absolute­ly nec­es­sary in the data­base
  • run­ning a good data­base that can scale well ver­ti­cal­ly

If you real­ly need to scale hor­i­zon­tal­ly, make sure you under­stand the ques­tions you intend to ask (of your data).

Make sure that you par­ti­tion in a fash­ion that doesn’t require more than a sin­gle shard to answer OLTP-style ques­tions. If that’s not pos­si­ble, con­sid­er data dupli­ca­tion (store the same data mul­ti­ple times, once for each ques­tion).

 

There are some alter­na­tives to tra­di­tion­al RDBMS sys­tems, with­out an imposed rela­tion­al mod­el federating/sharding is much eas­i­er to bake in.

By relax­ing con­sis­ten­cy require­ments, one can increase avail­abil­i­ty by adopt­ing a par­a­digm of even­tu­al con­sis­ten­cy.

  • Mon­goDB
  • Cas­san­dra
  • Volde­mort
  • Redis
  • Riak

BUT, NoSQL sys­tems aren’t a cure-all, a lot of data has rela­tion­ships that are impor­tant.

 

Ref­er­en­tial integri­ty is quite impor­tant in many sit­u­a­tions, and a lot of datasets do not need to scale past a sin­gle instance.

Also, not every com­po­nent of the archi­tec­ture needs to scale past the lim­its of ver­ti­cal scal­ing.

 

If you can seg­re­gate your com­po­nents, you can adhere to a right tool for the job par­a­digm.

Use SQL where it is the best tool for the job and use dis­trib­uted key-val­ue stores and doc­u­ment data­bas­es in sit­u­a­tions where they shine.

 

NoSQL does not need no DBAs”

Even when you’re oper­at­ing in an envi­ron­ment where there are no peo­ple with DBA job titles, there’s still a job being done to con­trol and con­straint the data mod­el. When you put some data into a data­base, that’s your data mod­el and it needs to be con­trolled, if not by DBA then by appli­ca­tion devel­op­ers, oth­er­wise all hell will break loose!

Regard­less of whether you’re using schema or semi-schema data, you still have a job to keep the data back­ward and for­ward com­pat­i­ble with your soft­ware. DBA have a lot of expe­ri­ence doing that! So it’s worth learn­ing from their expe­ri­ences even if we’re not going to use their data­bas­es any­more ;-)

 

If you break the prob­lems down into small pieces then you can start using dif­fer­ent data­bas­es for dif­fer­ent prob­lems. Deter­mine how big the prob­lems is and can grow and use the appro­pri­ate data­base to han­dle it, so that you fit the solu­tion to the prob­lem.

 

Avoid:

  • shiny is good”
  • over engi­neer­ing”

Embrace:

  • K.I.S.S”
  • good is good”

 

NoSQL realities

NoSQL sys­tems are built to han­dle sys­tem fail­ures.

 

NoSQL sys­tem per­for­mance num­bers and sta­bil­i­ty reviews are nev­er derived dur­ing fail­ure con­di­tions.

 

NoSQL sys­tems tend to behave very bad­ly dur­ing fail­ure sce­nar­ios, because their oper­a­tors assume unal­tered oper­a­tions.

 

Think about the per­for­mance degra­da­tion of doing a filesys­tem back­up of a tra­di­tion­al RDBMS dur­ing peak usage.

In fail­ure sce­nario of NoSQL, sim­i­lar such tax­es exist, but:

  • peo­ple tend to oper­ate them under heavy load with no head­room
  • the head­room for node recov­ery and degrad­ed oper­a­tion are quite large

The per­for­mance met­rics NEEDS to reflect fail­ure recov­ery sce­nar­ios, because that’s when you’re going see your peak load as per Murphy’s law!

 

Rule #6 — Appro­pri­ate­ness is both com­pre­hen­sive and objec­tive

 

When eval­u­at­ing the per­sis­tence solu­tion for your data, match the require­ments with the solution’s fea­tures and char­ac­ter­is­tics.

 

Networking

The net­work is part of the archi­tec­ture.

So often for­got­ten by the data­base engi­neers and the appli­ca­tion coders and the front-end devel­op­ers and the design­ers.

 

The net­work doesn’t work with bytes, it doesn’t send bytes, every­thing revolves around pack­ets.

 

Many appli­ca­tions today are so poor­ly designed that net­work issues nev­er become scal­a­bil­i­ty con­cerns, this is for the appli­ca­tion archi­tec­tures that have high traf­fic rates.

 

How do you push 10/20 GigE? Buy­ing a real­ly expen­sive load bal­ancer is one option, but most com­pa­nies don’t, because there are cheap­er ways to do it — use rout­ing.

Rout­ing sup­ports extreme­ly naive load bal­anc­ing, you can run rout­ing pro­to­col on your edge machines to route traf­fic to a set of sta­t­ic address­es. (e.g. CDNs)

 

This adds fault-tol­er­ance and dis­trib­utes net­work load.

 

Rule #7 — Solu­tions should be as close to the cus­tomer as pos­si­ble

 

Service Decoupling

One of the most fun­da­men­tal tech­niques for build­ing scal­able sys­tems.

 

Asyn­chrony

  • Break down the user trans­ac­tion into parts.
  • Iso­late those that could occur asyn­chro­nous­ly
  • Queue the infor­ma­tion need­ed to com­plete the task
  • Process the queues “behind the scenes”

 

If I don’t want to do some­thing now, I must tell some­one to do it lat­er. This is “mes­sag­ing”.

There are a lot of solu­tions:

  • JMS (Java mes­sage ser­vice)
  • Spread (extend­ed vir­tu­al syn­chrony mes­sag­ing bus)
  • AMQP (advanced mes­sage queue pro­to­col)
  • ZeroMQ (“Fast” mes­sag­ing)

 

There are some fun­da­men­tal prob­lems in mes­sag­ing, such as ‘how do you know your mes­sage was received?’, many peo­ple are send­ing lots of mes­sages to a mes­sag­ing ser­vice and just assumed that they got there.

Again, you need to under­stand the fail­ure cas­es, and under­stand the con­se­quences of these fail­ure cas­es. In a lot of cas­es, so long you can lim­it the like­li­hood of that hap­pen­ing you might be able to say that you don’t real­ly care about the occa­sion­al­ly dropped mes­sages.

 

You can adopt this men­tal­i­ty for the 99.99% mes­sages that aren’t real­ly impor­tant, but you need to make sure the 0.01% of mes­sages that are are guar­an­teed deliv­ery and works reli­ably and ful­ly per­sis­tent.

Again, you iden­ti­fy the com­po­nents where these oper­a­tional con­straints are real­ly impor­tant and solve the prob­lems only in those places.

(most) Asyn­chro­nous­ly (and, even more so, dis­trib­uted) sys­tems are:

  • Com­plex
  • Non-sequen­tial
  • Self-incon­sis­tent
  • Under-engi­neered
  • Under-instru­ment­ed
  • Unnec­es­sary
  • Scale very very well

Yes you need dis­trib­uted sys­tems, but all things in mod­er­a­tion.

 

Rule #8 — Com­plex­i­ty is the dev­il

Rule #9 — Deal with the dev­il only when nec­es­sary

 

Don’t be an idiot

Most acute scal­a­bil­i­ty dis­as­ters are due to idiots.

 

Scal­ing is hard, per­for­mance is eas­i­er.

Per­for­mance is very impor­tant, if you can reduce the num­ber of servers you run, then you reduce the oper­a­tional com­plex­i­ty and be able to get your head around net­work­ing because:

Net­work­ing is HARD, it’s real­ly real­ly HARD to get tens of thou­sands of servers to talk to each oth­er.

 

Extreme­ly high-per­for­mance sys­tems tend to be eas­i­er to scale because they don’t have to scale as much.

 

Rule #10 — Don’t be a **** idiot

 

Idio­cy is real­ly bad, and con­ta­gious.