AWS Lambda: how to detect and stop accidental infinite recursions

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

Lambda makes it easy to build event-driven architectures. Each function only needs to look after its own responsibilities. We chain them together through APIs, queues, streams, and other event sources. It’s a good way to build complex systems out of simple components. We need only managed services, and we only pay for them when we use them.

But, sometimes, we make mistakes. A few blog posts had made the rounds on social media a while back. A few people had accidentally got their functions in an infinite recursion. A sizable AWS bill soon followed.

In one incident, S3 triggers a Lambda function which puts a modified file back into the same bucket. Which triggered the same function, which puts the file back into the same bucket. And the cycle continued on and on.

Only if there’s a way to track the number of invocations on a call chain so we can put on a break when it gets too long!

dazn-lambda-powertools

We (at DAZN) recently open-sourced our Lambda powertools project. The goal of the project is to empower our engineers to build production-ready serverless applications. And observability is a big part of that.

The project gives you the tools to automatically extract and forward correlation IDs. It also supports log sampling for the entire call chain. So you’ll always have a sample of debug log messages even when you’re logging at a higher level. In production, you will typically log at WARN or ERROR level to save cost. More details are available here.

Tracking the length of a call chain

The same mechanisms used to pass correlation IDs along can help track the length of the call chain. All we need is to pass a counter along with the other correlation IDs. Each time we extract the correlation IDs, we will increment the counter by 1.

Stopping infinite loops

Once we’re able to track the length of call chains, we can stop them when they become too long. We added a middleware to stop invocations when the call chain length reaches a threshold.

Since most call chains are short, the default threshold value is set to 10. But you can override it to suit your needs. You can apply the middleware like any other middy middlewares. And you can also use it in conjunction with other middy middlewares too.

Show me, don’t tell me!

To see this in action, I put together a very simple demo project. The source code is available here.

This demo project consists of a single function, which calls itself every time. This function is missing a termination condition. It will loop indefinitely if left unchecked. Thankfully, the stop-indefinite-loop middleware will stop the call chain on the third invocation.

After I triggered the function with sls invoke -f hello I was able to verify the expected behaviour in the logs. On the third recursion, the invocation was stopped dead in its tracks. It even stopped the subsequent retries (Lambda retries failed async invocations twice)!

You can see the invocations were stopped in the metrics as well.

Limitation

The middleware does not work for SQS and Kinesis functions. These event sources send events to Lambda in batches and require special handling. The call chain lengths are still passed along and incremented as you’d expect. But the middleware that stops long call chains does not work here. Instead, you’d need to apply that logic when you process an individual record.

Contributing

We hope these tools would help you in your serverless journey as well. And we welcome your contributions, please feel free to suggest features or report bugs. Please also check out our contribution guide if you want to lend us a hand!

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter


Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.


Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.


Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.


Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!


Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!