How to do database migration for a live service with no downtime

Performing a database migration while continuing to serve user requests can be challenging. It’s a question that many students have asked during the Production-Ready Serverless [1] workshop.

So here’s my tried-and-tested approach to migrating a live service to a new database without downtime. I’m going to use DynamoDB as an example, but it should work with most other databases.

Can you keep it simple?

Before we dive into it, I want to remind you to keep things simple whenever you can. If the database migration can be completed within a reasonable timeframe, then consider doing it over a small maintenance window.

This is often not possible for large applications with a global user base. Or maybe you’re working in a microservices environment where downtime for a single service can impact many others.

However, it might be a good option for smaller applications or applications with a regional user base.

Ok, with that said, let’s go.

Step 1: redirect writes to the new database

First, make sure all inserts and updates go to the new database.

Step 2: use the old database as fallback

Use the old database as a fallback for read operations. If the intended data is not available in the new database then fetch it from the old database and save it into the new database.

This is similar to a read-through cache.

Implementing these two steps will deal with the active data that users are interacting with.

Step 3: run a script to migrate inactive data

Run a background script to migrate all data to the new database.

You should start the background script AFTER the application has been updated to perform Steps 1 & 2 above. Once the application has been updated, it will write the active data into the new database.

We need to make sure the script doesn’t override newer versions of the data we’re migrating.

Assuming the new database is a DynamoDB table, we need to use conditional puts. Use the attribute_not_exists conditional function to ensure the item doesn’t exist in the DynamoDB table already.

Dealing with deletes

But what about deletes?

This sequence of events will be problematic:

The background script reads data from the old database.
The application receives a request to delete the data. The data doesn’t exist in the new database.
The application deletes the data from the old database.
The background script writes the data into the new database.

Oops, we just added a piece of deleted data back into the system!

Thank you, race condition…

To handle this scenario, we can write a tombstone record [2] in the new database. This stops the background script from writing the deleted data back into the system.

However, it might require behaviour change in the application to handle these tombstone records in read operations. Luckily, it doesn’t have to be forever.

Tombstones are necessary during the migration process. But once the background script has finished you can clean things up by:

Run another script against the new database to delete all tombstones.
Update the application to remove the code that handles tombstones (in read operations).

Wrap up

This is my simple, 3-step process to perform database migration for a live service without needing downtime. As mentioned at the start of this post, it should apply to most database systems. For this process to work, your new database needs to support some form of conditional write operation.

If you want to learn more about building production-ready serverless applications, then why not check out my next workshop?

How to perform database migration for a live service with no downtime