AWS Lambda – constant timeout when using bluebird Promise

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

Hello! Sorry for the lack of posts recently, it’s been a pretty hectic time here at Yubl, with plenty of exciting work happening and even more on the way. Hopefully I will be able to share with you some of the cool things we have done and valuable lessons we have learnt from working with AWS Lambda and Serverless in production.

Today’s post is one such lesson, a slightly baffling and painful one at that.

 

The Symptoms

We noticed that the Lambda function behind one of our APIs in Amazon API Gateway was timing out consistently (the function is configured with a 6s timeout, which is what you see in the diagram below).

lambda-bluebird-latency-spike

Looking in the logs it appears that one instance of our function (based on the frequency of the timeouts I could deduce that AWS Lambda had 3 instances of my function running at the same time) was constantly timing out.

What’s even more baffling is that, after the first timeout, the subsequent Lambda invocations never even enters the body of my handler function!

Considering that this is a Node.js function (running on the Node.js 4.3 runtime), this symptom is similar to what one’d expect if a synchronous operation is blocking the event queue so that nothing else gets a chance to run. (oh, how I miss Erlang VM’s pre-emptive scheduling at this point!)

So, as a summary, here’s the symptoms that we observed:

  1. function times out the first time
  2. all subsequent invocations times out without executing the handler function
  3. continues to timeout until Lambda recycles the underlying resource that runs your function

which, as you can imagine, is pretty scary – one strike, and you’re out

Oh, and I managed to reproduce the symptoms with Lambda functions with other event source types too, so it’s not specific to API Gateway endpoints.

 

Bluebird – the likely Culprit

After investigating the issue some more, I was able to isolate the problem to the use of bluebird Promises.

I was able to replicate the issue with a simple example below, where the function itself is configured to timeout after 1s.

lambda-bluebird-timeout-example

As you can see from the log messages below, as I repeatedly hit the API Gateway endpoint, the invocations continue to timeout without printing the hello~~~ message.

lambda-bluebird-timeout-example-log

At this point, your options are:

a) wait it out, or

b) do a dummy update with no code change

 

On the other hand, a hand-rolled delay function using vanilla Promise works as expected with regards to timeouts.

lambda-bluebird-timeout-example-2

lambda-bluebird-timeout-example-log-2

 

Workarounds

The obvious workaround is not to use bluebird, and any library that uses bluebird under the hood – e.g. promised-mongo.

Which sucks, because:

  1. bluebird is actually quite useful, and we use both bluebird and co quite heavily in our codebase
  2. having to check every dependency to make sure it’s not using bluebird under the hood
  3. can’t use other useful libraries that use bluebird internally

However, I did find that, if you specify an explicit timeout using bluebird‘s Promise.timeout function then it’s able to recover correctly. Presumably using bluebird’s own timeout function gives it a clean timeout whereas being forcibly timed out by the Lambda runtime screws with the internal state of its Promises.

The following example works as expected:

lambda-bluebird-timeout-example-3

lambda-bluebird-timeout-example-log-3

But, it wouldn’t be a workaround if it doesn’t have its own caveats.

It means you now have one more error that you need to handle in a graceful way (e.g. mapping the response in API Gateway to a 5XX HTTP status code), otherwise you’ll end up sending this kinda unhelpful responses back to your callers.

lambda-bluebird-timeout-example-log-4

 

So there, a painful lesson we learnt whilst running Node.js Lambda functions in production. Hopefully you have found this post in time before running into the issue yourself!

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter


Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.


Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.


Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.


Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!


Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!