The anti-polling pattern for Step Functions

Yan Cui

I help clients go faster for less using serverless technologies.

Step Functions is often used to poll long-running processes, e.g. when starting a new data migration task with Amazon Database Migration.

There’s usually a Wait -> Poll -> Choice loop that runs until the task is complete (or failed), like the one below.

Polling is inefficient and can add unnecessary cost as standard workflows are charged based on the number of state transitions.

There is an event-driven alternative to this approach.

Here’s the high level approach:

  1. To start the data migration, the state machine calls a Lambda function with a task token (required for callback). This pauses the state machine execution.
  2. The Lambda function calls the Database Migration service to start the data migration.
  3. The function saves the data migration ARN (hash key) and the callback token in DynamoDB, along with other relevant information (created date, etc.)
  4. The Database Migration service publishes “StateChange” events to the default EventBridge event bus (see docs here). A Lambda function subscribes to this event and waits for a replication task to finish or fail.
  5. When triggered, the function extracts the data migration ARN from the event payload and retrieves the Step Functions task token from DynamoDB.
  6. It can use the task token to send a success or failure signal back to the state machine execution. From here, the state machine can proceed with the rest of its steps.

This is a more efficient but also more complex approach. There are more moving parts involved, but it’s simple to implement.

But what if you’re calling a 3rd party API that do not support events?

You can adapt this approach to work with any service that accept a callback URL. When the 3rd party service makes the callback, your API handler will look up the task token and make the callback to Step Functions. Everything else stays the same as before.

What’s more, both the polling and event-driven approach can be implemented with the new Lambda Durable Functions too!

The waitForCondition operation is perfect for implementing the polling loop in just a few lines, like this:

const job = await startDataMigrationJob();

const result = await context.waitForCondition(
  async (job, ctx) => {
    const status = await checkJobStatus(job.arn);
    return { ...job, status };
  },
  {
    initialState: { job, status: null },
    waitStrategy: (state) => 
      state.status === 'Stopped' || state.status === 'Failed' 
        ? { shouldContinue: false }
        : { shouldContinue: true, delay: { seconds: 30 } }
  }
);

Similarly, the waitForCallback operation makes implementing the event-driven approach trivial. Instead of a task token, we have to store a callback ID. As before, the callback can be triggered by an event or by a 3rd party service via a callback URL.

const job = await startDataMigrationJob();

const result = await context.waitForCallback(
  "wait for data migration to finish",
  async (callbackId, ctx) => {
    // save the callback ID against job ARN
    await saveJobDetails(job.arn, callbackId)

    // when the StateChange event is fired
    // another function will fetch the callback ID
    // and send a success/failure signal to the
    // durable execution
  }
);

Polling is the default because it’s easy, not because it’s good.

If you can get an event (or a callback), you can stop spinning your state machine and start treating “waiting” as a first-class step. Less noise, fewer transitions, lower cost.

Thank you to Patrick for bringing this up in our last Q&A session. If you want to level up your AWS game, check out my Production-Ready Serverless workshop. The next cohort starts on April 13th, and the early bird tickets (30% off) is available until March 16th.

Related Posts

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
  2. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
  3. Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.