Help! How do I set DeletionPolicy to Retain for production only?

Yan Cui

I help clients go faster for less using serverless technologies.

It’s a good practice to use CloudFormation’s DeletionPolicy to protect stateful resources such as DynamoDB tables or RDS databases from accidental deletions. Such as when someone accidentally deletes a CloudFormation stack!

As I discussed previously [1], this is a much better way to guard against these accidental data losses than separating stateful and stateless resources into different stacks. After all, how we think about infrastructure vs. code needs to evolve in the serverless era [2]. That separation is no longer so clear-cut.

The problem with DeletionPolicy

But using DeletionPolicy on all your stateful resources causes friction with another common practice in serverless development – the use of ephemeral (or temporary) environments [3].

If you set DeletionPolicy to Retain on all your stateful resources, then they will all linger when you delete the temporary environment.

In fact, you only need to use DeletionPolicy in production. That’s the only environment where it’s necessary.

So, how do we do that?

The CDK solution

CDK gives you the full power of a general-purpose programming language (even though it’s often not needed [4]). So it’s trivial to do this in a CDK application:

  1. Pass the environment name as either a CloudFormation parameter or context variable.
  2. Set the removalPolicy inside an if block like this:
if (environment === 'prod') {
  myTable.removalPolicy = cdk.RemovalPolicy.RETAIN
}

When you have branch logic like this in your CDK code, you should add unit tests for them.

And instead of using magic strings like “prod”, you should also capture constants as enums.

Additionally, you can use CDK Aspects to select multiple resources and update their policy in one go. For more details, check out this article [5] by Wojciech Matuszewski.

Serverless / SAM / CloudFormation solution

For the Serverless Framework, SAM or CloudFormation, you can do this:

  1. Add an Environment CloudFormation parameter. (not needed for the Serverless Framework)
  2. Add an IsProd condition.
  3. Use the !If intrinsic function so the DeletionPolicy attribute is set to Retain when the Environment is “prod”, else set it to Delete.

Like this:

Parameters:
  Environment:
    Type: String
    Default: dev
    Description: Environment name, e.g. dev, test, prod

Conditions:
  IsProd: !Equals [ !Ref Environment, "prod" ]

Resources:
  MyTable:
    Type: AWS::DynamoDB::Table
    DeletionPolicy: !If [ "IsProd", "Retain", "Delete" ]
    Properties:
      ...

For the Serverless Framework, you can skip step 1 because it has a built-in sls:stage parameter. So the IsProd condition becomes:

Conditions:
  IsProd: !Equals [ "${sls:stage}", "prod" ]

DynamoDB DeletionProtectionEnabled attribute

I have used DynamoDB in the examples above. But it’s worth pointing out that DeletionPolicy is actually not the most effective protection against data loss for DynamoDB.

For starters, it doesn’t guard against someone deleting a DynamoDB table in the console or programmatically through the AWS CLI or SDK.

Fortunately, DynamoDB also has the DeletionProtectionEnabled [6] attribute. When enabled, it protects the table from accidental deletion by any user or process.

Additionally, DynamoDB offers other protection against data loss, such as point-in-time recovery and the ability to export data to S3.

Having said that, DeletionPolicy is still a very useful tool for other services such as S3, EventBridge, SQS, SNS, etc.

Use RetainExceptOnCreate instead of Retain

Another thing to consider is that the new RetainExceptOnCreate deletion policy (announced in July 2023) has superseded the Retain deletion policy. It should be used as a stand-in replacement for the Retain deletion policy these days.

Built it into your Platform

Instead of every team having to remember to do this with every project and every stateful resource, it might be better to make this a feature of your cloud platform.

Lars Jacobsson gave us an example of how that might look like, using a CloudFormation macro.

If you need a refresher on CloudFormation macros and what they can do, then please check out this article [7] by Alex DeBrie

Wrap up

Finally, I want to give a shout-out to Hala Al Aali for asking me about this in the Testing Serverless Architectures [8] course’s forum.

It’s a common problem that many students have come across. I hope this article has helped you. If you want to learn more about building serverless applications for the real world, check out my upcoming workshops [9].

Links

[1] This is why you should keep stateful and stateless resources together

[2] Are we getting infrastructure all wrong in the Serverless era?

[3] Serverless Ephemeral (Temporary) Environments Explained

[4] Are You Ready for This? Top 5 Earth-Shattering Pros and Cons of AWS CDK

[5] Enforcing compliance with AWS CDK Aspects

[6] DynamoDB’s DeletionProtectionEnabled attribute

[7] How and Why to use CloudFormation Macros

[8] Testing Serverless Architectures course

[9] Production-Ready Serverless workshop

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop to level up your serverless skills quickly.
  2. Do you want to know how to test serverless architectures with a fast dev & test loop? Check out my latest course, Testing Serverless Architectures and learn the smart way to test serverless.
  3. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.