Step Functions : apply try-catch to a block of states

In my last post we talked about how we can implement semaphores with Step Functions. Another common scenario that many people have is to handle errors from a block of states like we’re used to with a try-catch block.

try {
  step1()
  step2()
  step3()
} catch (States.Timeout) {
  ...
} catch (States.ALL) {
  ...
}

With Step Functions, you can use Retry and Catch clauses to handle errors from Task states. There are a number of predefined system errors, and you can also handle custom errors that are thrown by your Lambda functions.

You can do this by adding the same Catch clause to each of the Task states.

"Catch": [
  {
    "ErrorEquals": [ "States.ALL" ],
    "Next": "NotifyError"
  }
]

However, this approach requires you to add the same boilerplate to every Task state. As your error handling strategy, or the state machine itself becomes more complex, this becomes a maintenance headache.

Fortunately, both Retry and Catch can be used on Parallel states too!

Even if you’re not looking to perform tasks in parallel, you can still use it to simplify your error handling.

In this case, if I wrap Step1, Step2 and Step3 into a single branch inside a Parallel state, then I can catch unhandled errors from any of the steps with one Catch clause.

{
  "StartAt": "Try",
  "States": {
    "Try": {
      "Type": "Parallel",
      "Branches": [
        {
          "StartAt": "Step1",
          "States": {
            "Step1": {
              "Type": "Task",
              "Resource": "...",
              "Next": "Step2"              
            },
            "Step2": {
              "Type": "Task",
              "Resource": "...",
              "Next": "Step3"
            },
            "Step3": {
              "Type": "Task",
              "Resource": "...",
              "End": true              
            }
          }
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [ "States.ALL" ],
          "Next": "NotifyError"
        }
      ],
      "Next": "NotifySuccess"
    },
  ...
}

One final caveat with this approach is that, a Parallel state wraps the output from its branches into an array. So if subsequent states?—?such as the NotifySuccess state in the example above?—?wants to use the output from Step3 then it’ll have to take that into consideration.

What you can do instead, is to add a Pass state to unwrap the array, like this:

"UnwrapOutput": {
  "Type": "Pass",
  "InputPath": "$[0]", 
  "Next": "NotifySuccess"
}

This technique is useful when you want to apply the same error handling to block of states without having to resorting to boilerplates.

You can add Retry clause to the Parallel state to retry the entire block (i.e. from Step1, even if Step3 errored). You can also add Retry and Catch for individual states to mix things up too.

So that’s it, a nice and short post to share with you a simple technique that I have found useful with Step Functions.

I have been spending a fair bit of time with Step Functions and enjoying the service. Let me know in the comments if you have use cases that you find difficult to implement with Step Functions, I would love to hear what others are doing with it.

Like what you’re reading, why not check out my video course with Manning or hire me?

In the video course we will cover topics including:

  • authentication & authorization with API Gateway & Cognito
  • testing & running functions locally
  • CI/CD
  • log aggregation
  • monitoring best practices
  • distributed tracing with X-Ray
  • tracking correlation IDs
  • performance & cost optimization
  • error handling
  • config management
  • canary deployment
  • VPC
  • security
  • leading practices for Lambda, Kinesis, and API Gateway

You can also get 40% off the face price with the code ytcui. Hurry though, this discount is only available while we’re in Manning’s Early Access Program (MEAP).

Subscribe to my newsletter and get new contents delivered straight to your inbox :-)