CraftConf 15–Takeaways from “Jepsen IV: Hope Springs Eternal”

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

This talk by Kyle Kingsbury (aka @aphyr on twitter) was my favourite at CraftConf, and gave us an update on the state of consistency with MongoDB, Elasticsearch and Aerospike.


Kyle opened the talk by talking about how we so often build applications on top of databases, queues, streams, etc. and that these systems we depend on are really quite flammable (hence the tyre analogy).

image anybody who’s ever used any database knows, everything is on fire all the time! But our goal is to pretend, and ensure that everything still works… we need to isolate the system from failures.


– Kyle Kingsbury



Which led nicely into the type of failures that the rest of the talk will focus on – split brain, broken foreign keys, etc. And the purpose of his Jepsen project is to analyse a system against these failures.


A system has boundaries, and these boundaries should be protected by a set of invariants – e.g. if you put something into a queue then you should be able to read it out afterwards.

The rest of the talk splits into two halves.

The 1st half builds up a model for talking about consistency:


and the 2nd half of the talk looked at a number of specific instances of databases – Elasticsearch, MongoDB and AeroSpike – and see how they stacked up against the consistency guarantees they claim to have.


Rather than trying to explain them here and doing a bad job of it, I suggest you read Kyle’s post on the different consistency models from his diagram.

It’s a 15-20 mins read, after which you might also be interested to give these two posts a read too:


Instead I’ll just list a few key points I noted during the session:

  • CAP theorem tells us that a linearizable system cannot be totally available
  • for the consistency models in red, you can’t have total availability (the A in CAP) during a partition
  • for total availability, look to the area
  • weaker consistency models are more available in case of failure
  • weaker consistency models are also less intuitive
  • weaker consistency models are faster because they require less coordination
  • weak is not the same as unsafe – safety depends on what you’re trying to do, e.g. eventual consistency is ok for counters, but for claiming unique usernames you need linearizability

Kyle’s Jepsen client uses black-box testing approach to test database systems (i.e. only looking at results from a client’s perspective) whilst inducing network partitions to see how the database behaves during a partition.

The clients generate random operations and apply them to the system. Since clients run on the same JVM so you can use linearizable data structures to record a history of results as received by the clients and use that history to detect consistency violations.

This is similar to the generative testing approach used by QuickCheck. Scott Wlaschin has two excellent posts to help you get started with FsCheck, a F# port of QuickCheck.


“MongoDB is not a bug, it’s a database”

– Kyle Kingsbury

and thus began a very entertaining second half of the talk as Kyle shared results from his tests against MongoDB, Elasticsearch and AeroSpike.



None of the databases were able to meet the consistency level they claim to offer, but at least Elasticsearch is honest about it and doesn’t promise you the moon.


Again, seeing as Kyle has recently written about these results in detail, I won’t repeat them here. The talk doesn’t go into quite as much depth, so if you have time I recommend reading his posts:


Whilst it was fun watching Kyle shoot holes through these database vendors’ consistency claims, and some of the fun-poking is really quite funny (and well deserved on the vendor’s part).

If there’s one thing you should takeaway from Kyle’s talk, and his work with Jepsen in general is, don’t drink the kool-aid.

Database vendors have a history of over-selling and at times out-right false marketing. As developers, we have the means to verify their claims, so next time you hear a claim that’s too good to be true, verify it, don’t drink the kool-aid.



Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter

Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.

Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.

Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.

Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!

Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!