Beware of dilution of DynamoDB throughput due to excessive scaling

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

update Jan 26, 2020: since this article was already written, the problem of throughput dilution has been resolved by DynamoDB’s adaptive capacity feature which is applied in real-time.

TL;DR – The no. of partitions in a DynamoDB table goes up in response to increased load or storage size, but it never come back down, ever.

DynamoDB is pretty great, but as I have seen this particular problem at 3 different companies – Gamesys, JUST EAT, and now Space Ape Games – I think it’s a behaviour that more folks should be aware of.

Credit to AWS, they have regularly talked about the formula for working out the no. of partitions at DynamoDB Deep Dive sessions.

However, they often forget to mention that the DynamoDB will not decrease the no. of partitions when you reduce your throughput units. It’s a crucial detail that is badly under-represented in a lengthy Best Practice guide.

Consider the following scenario:

  • you dial up the throughput for a table because there’s a sudden spike in traffic or you need the extra throughput to run an expensive scan
  • the extra throughputs cause DynamoDB to increase the no. of partitions
  • you dial down the throughput to previous levels, but now you notice that some requests are throttled even when you have not exceeded the provisioned throughput on the table

This happens because there are less read and write throughput units per partition than before due to the increased no. of partitions. It translates to higher likelihood of exceeding read/write throughput on a per-partition basis (even if you’re still under the throughput limits on the table overall).

When this dilution of throughput happens you can:

  1. migrate to a new table
  2. specify higher table-level throughput to boost the through units per partition to previous levels

Given the difficulty of table migrations most folks would opt for option 2, which is how JUST EAT ended up with a table with 3000+ write throughput unit despite consuming closer to 200 write units/s.

In conclusion, you should think very carefully before scaling up a DynamoDB table drastically in response to temporary needs, it can have long lasting cost implications.

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter


Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.


Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.


Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.


Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!


Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!


4 thoughts on “Beware of dilution of DynamoDB throughput due to excessive scaling”

  1. Yan,

    Thanks for some great insights from your lambda adventures at Yubl, and after. Do you consult on designing lambda infrastructures? My startup (Tradle) is migrating infrastructure from containerized to lambda. Is there a way we can get some time with you to pick your brain and evaluate some tentative designs?

    Thanks,
    -Mark

  2. theburningmonk

    Hi Mark,

    I don’t do consulting, but happy to help you out if you have specific questions. DM me on twitter and we can sort something out.

    Cheers,

  3. Would the Aurora based solutions be affected (maybe indirectly), since all managed DB services are based on the same low level data management infrastructure?

  4. I wouldn’t think so, because Aurora doesn’t manage/throttle throughput using these throughput units, so it wouldn’t be affected by this problem of diluting the throughput units when you excessively scale.

    Although they might share some common distributed file system (eg. many of Google’s service are built on top of their proprietary distributed file system) under the hood, that’d be much lower level of abstraction than where throughput management, which is something that’ll be managed at the application (as in, the database system) level.

Comments are closed.