Event-Driven Architectures: How to limit the scope of end-to-end tests

Yan Cui

I help clients go faster for less using serverless technologies.

During this week’s live Q&A session, a student from the Production-Ready Serverless [1] boot camp asked a really good question (to paraphrase):

“When end-to-end testing an Event-Driven Architecture, how do you limit the scope of the tests so you don’t trigger downstream event consumers?”

This is a common challenge in event-driven architectures, especially when you have a shared event bus.

The Problem

As you exercise your system through these tests, the system can generate events that are consumed by downstream systems. These can create a lot of noise for the downstream systems, especially if we use test events that they can’t process.

For example, maybe our test events do not contain all the fields, only the ones that we need to exercise our code. Or the event might reference external entities that do not exist (but our system doesn’t need to verify).

These often trigger errors and alerts for the downstream systems and make us a bad neighbour!

I have long championed using ephemeral environments [2] to allow developers to work on different features in isolated environments.

It’s an excellent fit for working with serverless technologies and their usage-based pricing. There’s negligible cost overhead for having many ephemeral environments when you are not paying for uptime.

However, ephemeral environments do not directly address the problem at hand. Events generated by end-to-end tests against the ephemeral environments will still cause the undesired side effects downstream.

The Solution

One way to address this problem is to conditionally create a copy of the shared resource (e.g. an event bus) as part of the service stack.

When you create an ephemeral environment, you will make a copy of the event bus (local to the system under test) and use it instead of the shared event bus.

Thus, you can achieve the desired separation between environments and avoid waking up your downstream neighbours!

Related resources such as IAM roles, resource policies, etc., must also be created conditionally.
?
I’ve used this approach a lot, and it’s relatively easy to implement.

Importantly, it allows teams to develop, deploy, and test their services independently and reduces cross-team dependency, a key indicator of high performance (as noted in Accelerate: The Science of Lean Software and DevOps [3] by Nicole Forsgren, Jez Humble, and Gene Kim)

The Implementation

You can implement this solution with any Infrastructure-as-Code tool.

With CloudFormation or tools that are built upon CloudFormation (e.g. SAM, Serverless Framework), you can use CloudFormation Conditions [4].

I have also created a plugin for the Serverless Framework [5] to make it easier to express conditions like this:

resources:
  Conditions:
    IsTempEnv:
      Fn::StartsWith:
        - ${sls:stage}
        - dev

With CDK, it’s a simple if-else statement.

With Terraform, you can use the count meta-argument [6] like this:

locals {
  is_temp_env = startswith(var.stage_name, "dev")
}

resource "type" "name" {
  count = local.is_temp_env ? 1 : 0
  # rest of the resource configuration
}

Other Approaches

Another approach is for everyone to agree that:

  • Events generated by tests should include a “is_test” attribute.
  • Event consumers should filter out events where “is_test” is true.

I don’t recommend this approach because it requires coordination from all participants (both event publishers and subscribers).

A standardised abstraction layer is key for this approach to work.

However, implementing consistent behaviour across the board can be challenging, especially if you need to support multiple programming languages and IaC tools.

It only takes one non-conforming participant to break the whole chain.

This approach adds complexity to both event publishers and consumers. Whereas the aforementioned approach (using conditional resources) only affects event publishers, and event consumers are none the wiser.

However, if most consumers in your system are also publishers, then there may not be much difference in implementation overhead.

Summary

So, to summarise, a common challenge in event-driven architectures is the potential for unintentionally triggering downstream consumers when running end-to-end tests.

Ephemeral environments can help, but not when you must publish events to a shared event bus.

In such cases, you can conditionally create a local copy of the event bus as part of the ephemeral environment so your test events will not trigger downstream event consumers.

If you want to learn more about real-world applications using serverless technologies, then check out the Production-Ready Serverless [1] boot camp, and I will teach you everything I know!

Links

[1] Production-Ready Serverless boot camp

[2] (video) Serverless Ephemeral (Temporary) Environments Explained

[3] (book) Accelerate: The Science of Lean Software and DevOps

[4] CloudFormation Conditions

[5] serverless-plugin-extrinsic-functions

[6] Terraform’s count meta-argument

Related Posts

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
  2. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
  3. Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.