Loose coupling and high cohesion are two of the most essential software engineering principles. Unrelated things should stay apart, while related elements should be kept together.
These principles apply at all levels of our application — from the system-level architecture all the way down to individual modules or functions.
With this simple principle in mind, let’s talk about the contentious topic of whether you should keep stateful and stateless resources in the same CloudFormation stack.
I’m very much in the monolith stack camp. I prefer to keep stateful (databases, queues, etc.) and stateless (Lambda functions, API Gateway, etc.) resources together.
Arguments for monolith stack
Assuming the CloudFormation stack encapsulates an entire service, which includes both stateful and stateless resources, then it makes sense to define all the resources in a single CloudFormation stack. This makes managing and deploying the service easier in a number of ways:
- Resource reference is easy. You can use
!GetAttagainst any resources defined in the stack. For example, when you need to pass the name of a DynamoDB table to a Lambda function as an environment variable.
- You can update both the stateful and stateless resources in a single commit and deployment. For example, when you need to add a new DynamoDB table and add a new Lambda function to use it.
- CI/CD set-up is simpler. One service, one stack, one repo, one pipeline.
- It’s easy to create ephemeral environments. For example, when you need to work on a new feature, you can create a temporary environment with a single deployment. Using the Serverless framework, this is as simple as running
npx sls deploy --stage <stage-name>. When you’re done with the feature, simply delete the temporary environment.
Arguments for separate stacks
The counter-arguments of this approach usually evolve from these three points:
1. It’s less risky if you separate the stateful resources into their own stack. If someone accidentally deletes the stateless stack, at least you won’t lose data.
2. Stateful resources change less often than stateless resources. So deployments will be faster if you only need to deploy the stateless resources most of the time.
3. CloudFormation has a hard limit of 500 resources per stack. Moving the stateful resources out allows you to fit more stateless resources into the stack.
All these counter-arguments sound reasonable, but how much do they matter in practice and do they justify the extra complexity of having separate stacks?
Moving the stateful resources into their own stack doesn’t eliminate the risk of accidental deletion. It just moves the target. Someone fat-fingers the delete button on the stateful stack and it’s game over.
The right way to protect against accidental deletion is to enable
Termination Protection on the stack and/or set
Retain on the stateful resources.
There are other ways resources can be deleted accidentally. For example, when you change the name of a DynamoDB table, CloudFormation would replace the table during deployment. As you can see from the official documentation below.
So you should also consider setting the
Retain to protect against data loss from accidental changes. This particular risk is present regardless of which stack the stateful resources reside in.
By default, CloudFormation skips resources that haven’t changed. So if the stateful resources haven’t been updated, they have a negligible impact on the time it takes to deploy the stack.
In most cases, how long it takes to update an existing stack is a function of the number of stateless resources. For example, Lambda functions, IAM roles, API Gateway resources, etc.
To illustrate this, I collected data from three CloudFormation stacks.
Stack 1: 5 Lambda functions.
Stack 2: 5 Lambda functions, and 5 DynamoDB tables.
Stack 3: 20 Lambda functions.
Putting aside the initial deployment time, this is how long it took to update these stacks on average (no changes to the DynamoDB tables):
Stack 1: 46.4 sec
Stack 2: 46.4 sec
Stack 3: 55 sec
There was no difference in the average deployment time between Stack 1 and Stack 2. This is despite Stack 2 having 5 additional DynamoDB tables.
Stack 3 has many more Lambda functions, so it takes on average 10 more seconds to update each time.
CloudFormation resource limits
You can work around the 500 resources limit using nested stacks.
Because you typically have far fewer stateful resources than stateless resources, moving them out won’t buy you much space in most situations.
While this is a valid argument, in practice, it doesn’t really matter unless you have lots of stateful resources in your stack. And even then, using nested stacks is a better way to deal with the 500 resource limit anyway.
The stateful and stateless resources need to work together for our system to function. To achieve high cohesion, we should keep them together. Separating them into separate stacks violates one of the most important principles in software engineering and complicates things unnecessarily.
However, this is not a hard and fast rule. There will always be edge cases for which it makes some sense to split them. But in most cases, keeping them together is the best choice.