Takeaways from Theo Schloss’s talk on Scalable Internet Architecture

This is my list of takeaways from a very good talk by Theo Schlossnagle on things you should be thinking about when designing a system for large scale.

I hope you enjoy the talk as much as I did, it certainly leaves plenty of food for thought and personally it is encouraging to see that many of the things we are doing/thinking at GameSys are echoed by experts in the field!

 

Disclaimer: this is by no means a comprehensive coverage of everything that was covered in the talk, just the things that stood out for me, if there are topics missing in the notes below that you feel is worthwhile for others to see please feel free to get in touch with me.

 

Design vs Implementation

When we think about designing a system and translating it to code we tend to think of the code as the implementation, which is weird when you really think about it, because unlike any other implementations of a design in the real world the code has no wear and tear and no structural stress factors.

If you design a bridge and build a bridge, years later the bridge does not work the same way due to wear and tear, because the bridge is a thing, it’s not the design but an implementation.

 

Our design for an application is actually the code, the implementation of our code is the execution on the CPU, hence we don’t actually do the implementation of our design, we just let it happen.

 

Componentization of a system allows us to be right once in a while and write programs that don’t have bugs because they’re really small programs.

 

Architecture vs Implementation

Architecture is a set of complex and carefully designed structure of something, in computing, it’s the organization of a computer or a computer-based systems.

  • Architecture is without specification of the vendor, make model of components.
  • Implementation is the adaptation of an architecture to embrace available technologies (e.g. specific languages and frameworks or products).
  • They’re intrinsically tied and there are lots arguments about which is which.

 

Architecture

An architecture is all encompassing.

  • space, power, cooling
  • servers, switches, routers
  • load balancers, firewalls
  • databases, non-database storage
  • dynamic applications
  • the architecture you export to the user (javascript, etc.)

 

If you have 50 million users, then you have 50 million nodes and all of them are your problem because you’re sending them presentation as well as function (code to execute).

 

No all people do all things, hence why not all people think of all those things.

 

However:

  • lack of awareness of the other disciplines is bad
  • leads to isolated decisions
  • which leads to unreasonable requirements elsewhere
  • which lead to over engineered products
  • stupid decisions
  • catastrophic failures

 

Running operations is serious stuff, it requires knowledge, tools, experience and discipline.

 

“Good judgment comes from experience. Experience comes from bad judgment.”

– Proverb

 

“Judge people on the poise and integrity with which they remediate their failures.”

– Theo Schlossnagle

 

Rule #1 – Everything must always be in version control (including configurations)

Rule #2 – If it’s not monitored it’s not in production (and documented)

 

Most system designers don’t understand maths, a typical box today is 8 core 2.8GHz = 22.4 billion instructions per second, and given that web apps are CPU bound if you’re doing 25 request/s on that 8 core box you’re doing things wrong.

 

Here are some simple steps to build a model for your system:

  • Draw it out
  • Take measurements and walk through the model to rationalize it
  • Map actions to consequences:
    • a user sign = 4 async DB Inserts, 1 AMQP durable, persistent message, 1 async DB read
  • Simulate traffic in dev environment and rationalize your model

 

There will always be empirical variance from your model, and being able to rationalize the model and explain the variance adds a lot of value because it leads to asking important questions.

 

Rule #3 – Always rationalize your inputs and outputs (questioning the realities and rationalizing them)

 

Service decoupling in complex systems give:

  • Simplified modelling and capacity planning
  • Slight inefficiencies
  • promotes lower contention
  • Requires design of systems with less coherency requirements
  • Each isolated service is simpler and safer
  • SCALES!

 

Optimization

“We should forget about small efficiencies, say about 97% of the time:

premature optimization is the root of all evil.

Yet we should not pass up our opportunities in that critical 3%.

A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.”

– Donald Knuth

“Knowing when optimization is premature defines the difference between the master engineer and the apprentice.”

– Theo Schlossnagle

 

Optimization comes down to a simple concept:

“don’t do work you don’t have to”

It can take the form of:

  • Computational reuse
  • Caching in a more general sense
  • Avoid the problem, and do no work at all

 

Optimization in dynamic content simply means:

  • Don’t pay to generate the same content twice (use caching)
  • Only generate content when things change
  • Break the system into components so that you can isolate the costs of things that change rapidly from those that change infrequently

 

There is a simple truth:

     “Your content isn’t as dynamic as you think it is”

  • Javascript, CSS and images are only referentially linked
  • They should all be consolidated and optimized
  • They should be publicly cacheable and expire 10 years from now
  • Use cache busting when you need to invalidate the cache
  • Use the cookie for per user info!

 

Asking hard questions of database can be “expensive”, instead you have two options:

  • cache the results – best when you can’t afford to be accurate
  • materialize a view on the changes – best when you need to be accurate

 

Rule #4 – Never solve a problem that you can otherwise avoid

Rule #5 – Do not repeat work

 

Databases

  • rule 1 : shard your database
  • rule 2 : shoot yourself

Horizontally scaling your databases via sharding/federating requires that you make concessions that should make you cry.

 

Databases (other than MySQL) scale vertically to a greater degree than most people admit. But at some point, when you’ve saturated your vertical scalability and you can’t scale up anymore, you cry really hard and you scale out.

 

The interesting part about scaling out (via Riak, etc.) is that they’re not sharding, instead they have distributed data management. Sharding is when you do it yourself, the distributed data management techniques used by these systems are backed and proven by research papers, tested, stressed, and they work.

 

Many times relational constraints are not needed on data. If this is the case, a traditional relational database is unnecessary.

There are cool technologies out there to do this:

  • “files”
  • NoSQL
  • cookies

 

Clever use of cookies can help you scale out much easier as data is stored on the customer’s computer, far away from your infrastructure and you no longer have to find solutions to deal with a problem you no longer have! There are many limitations on what you can do with cookies, but for the right uses cases you should definitely consider them.

 

Don’t always use the same solution you’ve used before, always evaluate.

 

Non-ACID databases can be easier to scale.

 

Vertical scaling is achieved via two mechanism:

  • doing only what is absolutely necessary in the database
  • running a good database that can scale well vertically

If you really need to scale horizontally, make sure you understand the questions you intend to ask (of your data).

Make sure that you partition in a fashion that doesn’t require more than a single shard to answer OLTP-style questions. If that’s not possible, consider data duplication (store the same data multiple times, once for each question).

 

There are some alternatives to traditional RDBMS systems, without an imposed relational model federating/sharding is much easier to bake in.

By relaxing consistency requirements, one can increase availability by adopting a paradigm of eventual consistency.

  • MongoDB
  • Cassandra
  • Voldemort
  • Redis
  • Riak

BUT, NoSQL systems aren’t a cure-all, a lot of data has relationships that are important.

 

Referential integrity is quite important in many situations, and a lot of datasets do not need to scale past a single instance.

Also, not every component of the architecture needs to scale past the limits of vertical scaling.

 

If you can segregate your components, you can adhere to a right tool for the job paradigm.

Use SQL where it is the best tool for the job and use distributed key-value stores and document databases in situations where they shine.

 

“NoSQL does not need no DBAs”

Even when you’re operating in an environment where there are no people with DBA job titles, there’s still a job being done to control and constraint the data model. When you put some data into a database, that’s your data model and it needs to be controlled, if not by DBA then by application developers, otherwise all hell will break loose!

Regardless of whether you’re using schema or semi-schema data, you still have a job to keep the data backward and forward compatible with your software. DBA have a lot of experience doing that! So it’s worth learning from their experiences even if we’re not going to use their databases anymore ;-)

 

If you break the problems down into small pieces then you can start using different databases for different problems. Determine how big the problems is and can grow and use the appropriate database to handle it, so that you fit the solution to the problem.

 

Avoid:

  • “shiny is good”
  • “over engineering”

Embrace:

  • “K.I.S.S”
  • “good is good”

 

NoSQL realities

NoSQL systems are built to handle system failures.

 

NoSQL system performance numbers and stability reviews are never derived during failure conditions.

 

NoSQL systems tend to behave very badly during failure scenarios, because their operators assume unaltered operations.

 

Think about the performance degradation of doing a filesystem backup of a traditional RDBMS during peak usage.

In failure scenario of NoSQL, similar such taxes exist, but:

  • people tend to operate them under heavy load with no headroom
  • the headroom for node recovery and degraded operation are quite large

The performance metrics NEEDS to reflect failure recovery scenarios, because that’s when you’re going see your peak load as per Murphy’s law!

 

Rule #6 – Appropriateness is both comprehensive and objective

 

When evaluating the persistence solution for your data, match the requirements with the solution’s features and characteristics.

 

Networking

The network is part of the architecture.

So often forgotten by the database engineers and the application coders and the front-end developers and the designers.

 

The network doesn’t work with bytes, it doesn’t send bytes, everything revolves around packets.

 

Many applications today are so poorly designed that network issues never become scalability concerns, this is for the application architectures that have high traffic rates.

 

How do you push 10/20 GigE? Buying a really expensive load balancer is one option, but most companies don’t, because there are cheaper ways to do it – use routing.

Routing supports extremely naive load balancing, you can run routing protocol on your edge machines to route traffic to a set of static addresses. (e.g. CDNs)

 

This adds fault-tolerance and distributes network load.

 

Rule #7 – Solutions should be as close to the customer as possible

 

Service Decoupling

One of the most fundamental techniques for building scalable systems.

 

Asynchrony

  • Break down the user transaction into parts.
  • Isolate those that could occur asynchronously
  • Queue the information needed to complete the task
  • Process the queues “behind the scenes”

 

If I don’t want to do something now, I must tell someone to do it later. This is “messaging“.

There are a lot of solutions:

  • JMS (Java message service)
  • Spread (extended virtual synchrony messaging bus)
  • AMQP (advanced message queue protocol)
  • ZeroMQ (“Fast” messaging)

 

There are some fundamental problems in messaging, such as ‘how do you know your message was received?’, many people are sending lots of messages to a messaging service and just assumed that they got there.

Again, you need to understand the failure cases, and understand the consequences of these failure cases. In a lot of cases, so long you can limit the likelihood of that happening you might be able to say that you don’t really care about the occasionally dropped messages.

 

You can adopt this mentality for the 99.99% messages that aren’t really important, but you need to make sure the 0.01% of messages that are are guaranteed delivery and works reliably and fully persistent.

Again, you identify the components where these operational constraints are really important and solve the problems only in those places.

(most) Asynchronously (and, even more so, distributed) systems are:

  • Complex
  • Non-sequential
  • Self-inconsistent
  • Under-engineered
  • Under-instrumented
  • Unnecessary
  • Scale very very well

Yes you need distributed systems, but all things in moderation.

 

Rule #8 – Complexity is the devil

Rule #9 – Deal with the devil only when necessary

 

Don’t be an idiot

Most acute scalability disasters are due to idiots.

 

Scaling is hard, performance is easier.

Performance is very important, if you can reduce the number of servers you run, then you reduce the operational complexity and be able to get your head around networking because:

Networking is HARD, it’s really really HARD to get tens of thousands of servers to talk to each other.

 

Extremely high-performance systems tend to be easier to scale because they don’t have to scale as much.

 

Rule #10 – Don’t be a **** idiot

 

Idiocy is really bad, and contagious.