Testing Step Functions: how to skip time when testing Timeout and Wait states

Yan Cui

I help clients go faster for less using serverless technologies.

This article is brought to you by

Step Functions, EventBridge, MSK, DynamoDB…stop hacking together AWS services and get back to building!

See how it works

When I previously wrote about testing Step Functions, I gave you a general strategy that consists of:

  • Component tests that target the Lambda functions (specifically, the custom code you wrote in those functions).
  • End-to-end tests that execute the state machine in the cloud.
  • Local tests using Step Functions Local where you can use mocks to help you test those hard-to-reach execution paths.

However, there’s one common problem that Step Functions Local won’t help you with—dealing with time. E.g. when you need to test an execution path behind a long wait state or an error path that is behind a long Timeout clause.

Because Step Functions Local doesn’t support skipping forward in time, I find the best solution is to rewrite the state machine definition in the test setup.

Let’s say you have a state machine for processing food orders, like this:

The “Notify restaurant” state has a timeout of 300 seconds and we want to test this error path.

I would write a test case like this:

const given = require('../../steps/given')
const when = require('../../steps/when')
const then = require('../../steps/then')
const chance = require('chance').Chance()
const retry = require('async-retry')

describe("Test case: restaurant doesn't respond to the order in time", () => {
  const orderId = chance.guid()

  describe('Given a local instance of the state machine', () => {
    let stateMachineArn

    beforeAll(async () => {
      stateMachineArn = await given.a_local_statemachine_instance(
        process.env.StateMachineArn,
        chance.guid(),
        (definitionJson) => {
          const definition = JSON.parse(definitionJson)
          definition.States['Notify restaurant'].TimeoutSeconds = 1
          return JSON.stringify(definition)
        }
      )
    })

    describe('When we start a local execution', () => {
      let executionArn
  
      beforeAll(async () => {
        executionArn = await when.we_start_local_execution(
          stateMachineArn, 
          { orderId })
      })

      it('Should add the order to the database', async () => {
        await then.an_order_exists_in_dynamodb(orderId)
      })
  
      it('Should send a SNS notification to the restaurant topic', async () => {
        const restaurantNotification = await then.a_restaurant_notification_is_received(orderId)
        expect(restaurantNotification.TaskToken).toBeTruthy()
      })

      it('Should update the order status to "NO_RESPONSE"', async () => {
        await retry(async () => {
          const order = await then.an_order_exists_in_dynamodb(orderId)
          expect(order.status).toEqual("NO_RESPONSE")
        }, {
          retries: 3,
          maxTimeout: 1000
        })
      })
    })
  })
})

The given.a_local_statemachine_instance helper function defines a state machine against Step Functions Local. But importantly, it allows me to rewrite the definition of the state machine and change the TimeoutSeconds setting to 1.

(definitionJson) => {
  const definition = JSON.parse(definitionJson)
  definition.States['Notify restaurant'].TimeoutSeconds = 1
  return JSON.stringify(definition)
}

This way, we only have to wait for a one-second delay (instead of 300!) before we can verify that the order’s status has been changed to NO_RESPONSE.

it('Should update the order status to "NO_RESPONSE"', async () => {
  await retry(async () => {
    const order = await then.an_order_exists_in_dynamodb(orderId)
    expect(order.status).toEqual("NO_RESPONSE")
  }, {
    retries: 3,
    maxTimeout: 1000
  })
})

As you can see, this approach is quite simple and lets you skip time when testing Wait states and Timeout clauses.

If you want to learn more about testing serverless architectures and see the full example in action, then check out my latest course “Testing Serverless Architectures”. It gives you practical advice on how to test different types of serverless architectures and deal with the specific challenges that come with them. Including API Gateway, AppSync, Step Functions and event-driven architectures.

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
  2. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
  3. Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.