The most common issue I have encountered in production are latency/performance related. They can be symptoms of a whole host of underlying causes ranging from AWS network issues (which can also manifest itself in latency/error-rate spikes in any of the AWS services), over-loaded servers to simple GC pauses.
Latency issues are inevitable – as much as you can improve the performance of your application, things will go wrong, eventually, and often they’re out of your control.
So you must design for them, and degrade the quality of your application gracefully to minimize the impact on your users’ experiences.
As backend developers, one of the fallacies that we often fall into is to allow our dev environments to be too lenient. Servers and databases are never under load during development, so we lure client developers into a false sense of comfort and set them up to fail when the application runs into a slow-responding server in production for the first time.
To program my fellow client developers to always be mindful of latency spikes, we decided to inject random latency delays on every request:
- check if we should inject a random delay;
- if yes, then work out how much latency to inject and sleep the thread;
- and finally invoke the original method to process the request
This is an implementation pattern that can be automated. I wrote a simple PostSharp attribute to do this, whilst piggybacking existing configuration mechanisms to control its behaviour at runtime.
Then I multicast the attribute to all our service endpoints and my work was done!
We run latency injection in our dev environment and it helped identify numerous bugs in the client application as a result and proved to be a worthwhile exercise.
But we didn’t stop there.
But, occasionally, legitimate requests can also be throttled as result of client bugs, over-zealous retry strategy or incorrectly configured throttling threshold.
Once again, we decided to make these errors much more visible in the dev environment so that client developers expect them and handle them gracefully.
To do that, we:
- set the threshold very low in dev
- used a PostSharp attribute to randomly inject throttling error on operations where it makes sense
The attribute that injects throttling error is very simple, and looks something along the lines of:
The same approach can be taken to include any service specific errors that the client should be able to gracefully recover from – session expiration, state out-of-sync, etc.
Design for Failure
Simulating latency issues and other errors fall under the practice of Design for Failure, which Simon Wardley identifies as one of the characteristics of a next-generation tech company.
p.s. you should check out Simon’s work on value chain mapping if you haven’t already, they’re inspiring.
Netflix’s use of Chaos Monkey and Chaos Gorilla is a shining example of Design for Failure at scale.
Chaos Monkey randomly terminates instances to simulate hardware failures and test your system’s ability to withstand such failures.
Chaos Gorilla takes this exercise to the next level and simulate outages to entire Amazon availability zones to test their system’s ability to automatically re-balance to other availability zones without user-visible impact or manual intervention.
Global redundancy, or not
Based on reactions to AWS outages on social media, it’s clear to see that many (ourselves included) do not take full advantage of the cloud for global redundancy.
You might scoff at that but for many the decision to not have a globally redundant infrastructure is a conscious one because the cost of such redundancy is not always justifiable.
It’s possible to raise your single-point-of-failure (SPOF) from individual resources/instances, to AZs, to regions, all the way to cloud providers.
But you’re incurring additional costs at each turn:
- your infrastructure becomes more complex and difficult to reason with;
- you might need more engineers to manage that complexity;
- you will need to invest in better tooling for monitoring and automation;
- you might need more engineers to build those tools;
- you incur more wastage in CPU/memory/bandwidth/etc. (it is called redundancy for a reason);
- you have higher network latency for cross-AZ/region communications;
Global redundancy at Uber
On the other hand, for many organizations the cost of downtime outweighs the cost of global redundancy.
For instance, for Uber’s customers the cost of switching to a competitor is low, which means availability is of paramount importance for Uber.
Uber devised a rather simple, elegant mechanism for their client applications to failover seamlessly in the event of a datacentre outage. See this post for more details.
Finally, as more and more companies adopt a microservices approach a whole host of challenges will become evident (many of which have been discussed in Michael Nygard’s Release it!).
One of these challenges is the propagation of latency through inter-service communications.
If each of your services have a 99 percentile latency of 1s then only 1% of calls will take longer than 1s when you depend on only 1 service. But if you depend on 100 services then 63% of calls will take more than 1s!
In this regard, Google fellow Jeff Dean’s paper Achieving Rapid Response Times in Large Online Services presents an elegant solution to this problem.
I haven’t put this into practice myself, but I imagine this can be easily implemented using Rx’s amb operator.
- The Netflix Simian Army
- Simon Wardley – Introduction to value chain mapping
- Jeff Dean – Achieving rapid response time in large online services
- QCon London 15 – takeaways from “Scaling Uber’s realtime market platform”
- Hacking the brains of other people with API design
- Design Pattern Automation