Step Functions

How to handle execution timeouts in AWS Step Functions

Step Functions lets you set a timeout on both Task states and the whole execution. By default, an execution can run for a year if TimeoutSeconds is not configured. To a user, the execution would appear as “stuck”. Which is why AWS best practices recommend using timeouts to avoid such scenarios. But once you have configured a timeout for the execution, it’s then important to consider what happens when you experience a timeout.

In this post, let’s explore 3 ways you can handle an execution timeout and use a Lambda function to perform automated remediation (e.g. applying rollbacks).

When to use Step Functions vs. doing it all in a Lambda function

I’m a big fan of Step Functions, but it’s yet another AWS service you must learn and pay for.

It also introduces additional complexities. My application is harder to test; my business logic is split between configuration (ASL) and code; and I have new decision points, such as whether to use Express Workflows or Standard Workflows.

So it’s fair to ask, “Why should we even bother with Step Functions?”. Why not just do everything in code, inside a Lambda function?

Let’s break down the pros and cons and look at the trade-offs of each.

Step Functions: combine Standard and Express workflows for fun & profit

Step Functions’ state machines come in two flavours. By understanding their strengths and limitations, you can harness the combined power of both to optimize your processes for efficiency and cost. Standard Workflows Optimal for: Business-critical operations like payment processing. Strengths: Suitable for low-throughput scenarios. High maximum duration ensures enough time for retries using exponential backoff. …

Step Functions: combine Standard and Express workflows for fun & profit Read More »

Testing Step Functions: how to skip time when testing Timeout and Wait states

When I previously wrote about testing Step Functions, I gave you a general strategy that consists of: Component tests that target the Lambda functions (specifically, the custom code you wrote in those functions). End-to-end tests that execute the state machine in the cloud. Local tests using Step Functions Local where you can use mocks to …

Testing Step Functions: how to skip time when testing Timeout and Wait states Read More »

Choreography vs Orchestration in the land of serverless

Choreography and Orchestration are two modes of interaction in a microservices architecture. In orchestration, there is a controller (the ‘orchestrator’) that controls the interaction between services. It dictates the control flow of the business logic and is responsible for making sure that everything happens on cue. This follows the request-response paradigm. In choreography, every service …

Choreography vs Orchestration in the land of serverless Read More »

How to do blue-green deployment for Step Functions

A client asked me the other day: “What happens to the running executions when I update a state machine?” Sadly, the answer is likely that existing executions would break if you have changed the input/output of the Lambda functions they call. The solution is to use specific versions or aliases of the functions instead. But …

How to do blue-green deployment for Step Functions Read More »

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close