I’m afraid you’re thinking about AWS Lambda cold starts all wrong

When I dis­cuss AWS Lamb­da cold starts with folks in the con­text of API Gate­way, I often get respons­es along the line of:

Meh, it’s only the first request right? So what if one request is slow, the next mil­lion requests would be fast.

Unfor­tu­nate­ly that is an over­sim­pli­fi­ca­tion of what hap­pens.

Cold start hap­pens once for each con­cur­rent exe­cu­tion of your func­tion.

API Gate­way would reuse con­cur­rent exe­cu­tions of your func­tion that are already run­ning if pos­si­ble, and based on my obser­va­tions, might even queue up requests for a short time in hope that one of the con­cur­rent exe­cu­tions would fin­ish and can be reused.

If, all the user requests to an API hap­pen one after anoth­er, then sure, you will only expe­ri­ence one cold start in the process. You can sim­u­late what hap­pens with Charles proxy by cap­tur­ing a request to an API Gate­way end­point and repeat it with a con­cur­ren­cy set­ting of 1.

As you can see in the time­line below, only the first request expe­ri­enced a cold start and was there­fore notice­ably slow­er com­pared to the rest.

1 out of 100, that’s bear­able, hell, it won’t even show up in my 99 per­centile laten­cy met­ric.

What if the user requests came in droves instead? After all, user behav­iours are unpre­dictable and unlike­ly to fol­low the nice sequen­tial pat­tern we see above. So let’s sim­u­late what hap­pens when we receive 100 requests with a con­cur­ren­cy of 10.

All of a sud­den, things don’t look quite as rosy — the first 10 requests were all cold starts! This could spell trou­ble if your traf­fic pat­tern is high­ly bursty around spe­cif­ic times of the day or spe­cif­ic events, e.g.

  • food order­ing ser­vices (e.g. JustEat, Deliv­eroo) have bursts of traf­fic around meal times
  • e-com­mence sites have high­ly con­cen­trat­ed bursts of traf­fic around pop­u­lar shop­ping days of the year — cyber mon­day, black fri­day, etc.
  • bet­ting ser­vices have bursts of traf­fic around sport­ing events
  • social net­works have bursts of traf­fic around, well, just about any notable events hap­pen­ing around the world

For all of these ser­vices, the sud­den bursts of traf­fic means API Gate­way would have to add more con­cur­rent exe­cu­tions of your Lamb­da func­tion, and that equates to a bursts of cold starts, and that’s bad news for you. These are the most cru­cial peri­ods for your busi­ness, and pre­cise­ly when you want your ser­vice to be at its best behav­iour.

If the spikes are pre­dictable, as is the case for food order­ing ser­vices, you can mit­i­gate the effect of cold starts by pre-warm­ing your APIs — i.e. if you know there will be a burst of traf­fic at noon, then you can sched­ule a cron job (aka, Cloud­Watch sched­ule + Lamb­da) for 11:58am that will hit the rel­e­vant APIs with a blast of con­cur­rent requests, enough to cause API Gate­way to spawn suf­fi­cient no. of con­cur­rent exe­cu­tions of your function(s).

You can mark these requests with a spe­cial HTTP head­er or pay­load, so that the han­dling func­tion can dis­tin­guish it from a nor­mal user requests and can short-cir­cuit with­out attempt­ing to exe­cute the nor­mal code path.

That’s great that you mit­i­gates the impact of cold starts dur­ing these pre­dictable bursts of traf­fic, but does it not betray the ethos of server­less com­pute, that you shouldn’t have to wor­ry about scal­ing?

Sure, but mak­ing users hap­py thumps every­thing else, and users are not hap­py if they have to wait for your func­tion to cold start before they can order their food, and the cost of switch­ing to a com­peti­tor is low so they might not even come back the next day.

Alter­na­tive­ly, you could con­sid­er reduc­ing the impact of cold starts by reduc­ing the length of cold starts:

  • by author­ing your Lamb­da func­tions in a lan­guage that doesn’t incur a high cold start time — i.e. Node.js, Python, or Go
  • choose a high­er mem­o­ry set­ting for func­tions on the crit­i­cal path of han­dling user requests (i.e. any­thing that the user would have to wait for a response from, includ­ing inter­me­di­ate APIs)
  • opti­miz­ing your function’s depen­den­cies, and pack­age size
  • stay as far away from VPCs as you pos­si­bly can! VPC access requires Lamb­da to cre­ate ENIs (elas­tic net­work inter­face) to the tar­get VPC and that eas­i­ly adds 10s (yeah, you’re read­ing it right) to your cold start

There are also two oth­er fac­tors to con­sid­er:

Final­ly, what about those APIs that are sel­dom used? It’s actu­al­ly quite like­ly that every time some­one hits that API they will incur a cold start, so to your users, that API is always slow and they become even less like­ly to use it in the future, which is a vicious cycle.

For these APIs, you can have a cron job that runs every 5–10 mins and pings the API (with a spe­cial ping request), so that by the time the API is used by a real user it’ll hope­ful­ly be warm and the user would not be the one to have to endure the cold start time.

This method for ping­ing API end­points to keep them warm is less effec­tive for “busy” func­tions with lots of con­cur­rent exe­cu­tions — your ping mes­sage would only reach one of the con­cur­rent exe­cu­tions and if the lev­el of con­cur­rent user requests drop then some of the con­cur­rent exe­cu­tions would be garbage col­lect­ed after some peri­od of idle­ness, and that is what you want (don’t pay for resources you don’t need).

Any­how, this post is not intend­ed to be your one-stop guide to AWS Lamb­da cold starts, but mere­ly to illus­trate that it’s a more nuanced dis­cus­sion than “just the first request”.

As much as cold starts is a char­ac­ter­is­tic of the plat­form that we just have to accept, and we love the AWS Lamb­da plat­form and want to use it as it deliv­ers on so many fronts. It’s impor­tant not to let our own pref­er­ence blind us from what’s impor­tant, which is to keep our users hap­py and build a prod­uct that they would want to keep on using.

To do that, you do need to know the plat­form you’re build­ing on top of, and with the cost of exper­i­men­ta­tion so low, there’s no good rea­son not to exper­i­ment with AWS Lamb­da your­self and try to learn more about how it behaves and how you can make the most of it.

Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).