AWS Lambda — constant timeout when using bluebird Promise

Hel­lo! Sor­ry for the lack of posts recent­ly, it’s been a pret­ty hec­tic time here at Yubl, with plen­ty of excit­ing work hap­pen­ing and even more on the way. Hope­ful­ly I will be able to share with you some of the cool things we have done and valu­able lessons we have learnt from work­ing with AWS Lamb­da and Server­less in pro­duc­tion.

Today’s post is one such les­son, a slight­ly baf­fling and painful one at that.

 

The Symptoms

We noticed that the Lamb­da func­tion behind one of our APIs in Ama­zon API Gate­way was tim­ing out con­sis­tent­ly (the func­tion is con­fig­ured with a 6s time­out, which is what you see in the dia­gram below).

lambda-bluebird-latency-spike

Look­ing in the logs it appears that one instance of our func­tion (based on the fre­quen­cy of the time­outs I could deduce that AWS Lamb­da had 3 instances of my func­tion run­ning at the same time) was con­stant­ly tim­ing out.

What’s even more baf­fling is that, after the first time­out, the sub­se­quent Lamb­da invo­ca­tions nev­er even enters the body of my han­dler func­tion!

Con­sid­er­ing that this is a Node.js func­tion (run­ning on the Node.js 4.3 run­time), this symp­tom is sim­i­lar to what one’d expect if a syn­chro­nous oper­a­tion is block­ing the event queue so that noth­ing else gets a chance to run. (oh, how I miss Erlang VM’s pre-emp­tive sched­ul­ing at this point!)

So, as a sum­ma­ry, here’s the symp­toms that we observed:

  1. func­tion times out the first time
  2. all sub­se­quent invo­ca­tions times out with­out exe­cut­ing the han­dler func­tion
  3. con­tin­ues to time­out until Lamb­da recy­cles the under­ly­ing resource that runs your func­tion

which, as you can imag­ine, is pret­ty scary — one strike, and you’re out

Oh, and I man­aged to repro­duce the symp­toms with Lamb­da func­tions with oth­er event source types too, so it’s not spe­cif­ic to API Gate­way end­points.

 

Bluebird — the likely Culprit

After inves­ti­gat­ing the issue some more, I was able to iso­late the prob­lem to the use of blue­bird Promis­es.

I was able to repli­cate the issue with a sim­ple exam­ple below, where the func­tion itself is con­fig­ured to time­out after 1s.

lambda-bluebird-timeout-example

As you can see from the log mes­sages below, as I repeat­ed­ly hit the API Gate­way end­point, the invo­ca­tions con­tin­ue to time­out with­out print­ing the hel­lo~~~ mes­sage.

lambda-bluebird-timeout-example-log

At this point, your options are:

a) wait it out, or

b) do a dum­my update with no code change

 

On the oth­er hand, a hand-rolled delay func­tion using vanil­la Promise works as expect­ed with regards to time­outs.

lambda-bluebird-timeout-example-2

lambda-bluebird-timeout-example-log-2

 

Workarounds

The obvi­ous workaround is not to use blue­bird, and any library that uses blue­bird under the hood — e.g. promised-mon­go.

Which sucks, because:

  1. blue­bird is actu­al­ly quite use­ful, and we use both blue­bird and co quite heav­i­ly in our code­base
  2. hav­ing to check every depen­den­cy to make sure it’s not using blue­bird under the hood
  3. can’t use oth­er use­ful libraries that use blue­bird inter­nal­ly

How­ev­er, I did find that, if you spec­i­fy an explic­it time­out using blue­bird’s Promise.timeout func­tion then it’s able to recov­er cor­rect­ly. Pre­sum­ably using bluebird’s own time­out func­tion gives it a clean time­out where­as being forcibly timed out by the Lamb­da run­time screws with the inter­nal state of its Promis­es.

The fol­low­ing exam­ple works as expect­ed:

lambda-bluebird-timeout-example-3

lambda-bluebird-timeout-example-log-3

But, it wouldn’t be a workaround if it doesn’t have its own caveats.

It means you now have one more error that you need to han­dle in a grace­ful way (e.g. map­ping the response in API Gate­way to a 5XX HTTP sta­tus code), oth­er­wise you’ll end up send­ing this kin­da unhelp­ful respons­es back to your callers.

lambda-bluebird-timeout-example-log-4

 

So there, a painful les­son we learnt whilst run­ning Node.js Lamb­da func­tions in pro­duc­tion. Hope­ful­ly you have found this post in time before run­ning into the issue your­self!

Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).