Step Functions : apply try-catch to a block of states

In my last post we talked about how we can implement semaphores with Step Functions. Another common scenario that many people have is to handle errors from a block of states like we’re used to with a try-catch block.

try {
  step1()
  step2()
  step3()
} catch (States.Timeout) {
  ...
} catch (States.ALL) {
  ...
}

With Step Functions, you can use Retry and Catch clauses to handle errors from Task states. There are a number of predefined system errors, and you can also handle custom errors that are thrown by your Lambda functions.

You can do this by adding the same Catch clause to each of the Task states.

"Catch": [
  {
    "ErrorEquals": [ "States.ALL" ],
    "Next": "NotifyError"
  }
]

However, this approach requires you to add the same boilerplate to every Task state. As your error handling strategy, or the state machine itself becomes more complex, this becomes a maintenance headache.

Fortunately, both Retry and Catch can be used on Parallel states too!

Even if you’re not looking to perform tasks in parallel, you can still use it to simplify your error handling.

In this case, if I wrap Step1, Step2 and Step3 into a single branch inside a Parallel state, then I can catch unhandled errors from any of the steps with one Catch clause.

{
  "StartAt": "Try",
  "States": {
    "Try": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step1",
          "States": {
            "Step1": {
              "Type": "Task",
              "Resource": "...",
              "Next": "Step2"              
            },
            "Step2": {
              "Type": "Task",
              "Resource": "...",
              "Next": "Step3"
            },
            "Step3": {
              "Type": "Task",
              "Resource": "...",
              "End": true              
            }
          }
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [ "States.ALL" ],
          "Next": "NotifyError"
        }
      ],
      "Next": "NotifySuccess"
    },
  ...
}

One final caveat with this approach is that, a Parallel state wraps the output from its branches into an array. So if subsequent states?—?such as the NotifySuccess state in the example above?—?wants to use the output from Step3 then it’ll have to take that into consideration.

What you can do instead, is to add a Pass state to unwrap the array, like this:

"UnwrapOutput": {
  "Type": "Pass",
  "InputPath": "$[0]", 
  "Next": "NotifySuccess"
}

This technique is useful when you want to apply the same error handling to block of states without having to resorting to boilerplates.

You can add Retry clause to the Parallel state to retry the entire block (i.e. from Step1, even if Step3 errored). You can also add Retry and Catch for individual states to mix things up too.

So that’s it, a nice and short post to share with you a simple technique that I have found useful with Step Functions.

I have been spending a fair bit of time with Step Functions and enjoying the service. Let me know in the comments if you have use cases that you find difficult to implement with Step Functions, I would love to hear what others are doing with it.


Yan Cui

I’m an AWS Serverless Hero and the author of Production-Ready Serverless. I have run production workload at scale in AWS for nearly 10 years and I have been an architect or principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. I currently work as an independent consultant focused on AWS and serverless.

You can contact me via Email, Twitter and LinkedIn.

Hire me.


Come learn about operational BEST PRACTICES for AWS Lambda: CI/CD, testing & debugging functions locally, logging, monitoring, distributed tracing, canary deployments, config management, authentication & authorization, VPC, security, error handling, and more.

Get Your Copy