I’m afraid you’re thinking about AWS Lambda cold starts all wrong

When I dis­cuss AWS Lamb­da cold starts with folks in the con­text of API Gate­way, I often get respons­es along the line of:

Meh, it’s only the first request right? So what if one request is slow, the next mil­lion requests would be fast.

Unfor­tu­nate­ly that is an over­sim­pli­fi­ca­tion of what hap­pens.

Cold start hap­pens once for each con­cur­rent exe­cu­tion of your func­tion.

API Gate­way would reuse con­cur­rent exe­cu­tions of your func­tion that are already run­ning if pos­si­ble, and based on my obser­va­tions, might even queue up requests for a short time in hope that one of the con­cur­rent exe­cu­tions would fin­ish and can be reused.

If, all the user requests to an API hap­pen one after anoth­er, then sure, you will only expe­ri­ence one cold start in the process. You can sim­u­late what hap­pens with Charles proxy by cap­tur­ing a request to an API Gate­way end­point and repeat it with a con­cur­ren­cy set­ting of 1.

As you can see in the time­line below, only the first request expe­ri­enced a cold start and was there­fore notice­ably slow­er com­pared to the rest.

1 out of 100, that’s bear­able, hell, it won’t even show up in my 99 per­centile laten­cy met­ric.

What if the user requests came in droves instead? After all, user behav­iours are unpre­dictable and unlike­ly to fol­low the nice sequen­tial pat­tern we see above. So let’s sim­u­late what hap­pens when we receive 100 requests with a con­cur­ren­cy of 10.

All of a sud­den, things don’t look quite as rosy — the first 10 requests were all cold starts! This could spell trou­ble if your traf­fic pat­tern is high­ly bursty around spe­cif­ic times of the day or spe­cif­ic events, e.g.

  • food order­ing ser­vices (e.g. JustEat, Deliv­eroo) have bursts of traf­fic around meal times
  • e-com­mence sites have high­ly con­cen­trat­ed bursts of traf­fic around pop­u­lar shop­ping days of the year — cyber mon­day, black fri­day, etc.
  • bet­ting ser­vices have bursts of traf­fic around sport­ing events
  • social net­works have bursts of traf­fic around, well, just about any notable events hap­pen­ing around the world

For all of these ser­vices, the sud­den bursts of traf­fic means API Gate­way would have to add more con­cur­rent exe­cu­tions of your Lamb­da func­tion, and that equates to a bursts of cold starts, and that’s bad news for you. These are the most cru­cial peri­ods for your busi­ness, and pre­cise­ly when you want your ser­vice to be at its best behav­iour.

If the spikes are pre­dictable, as is the case for food order­ing ser­vices, you can mit­i­gate the effect of cold starts by pre-warm­ing your APIs — i.e. if you know there will be a burst of traf­fic at noon, then you can sched­ule a cron job (aka, Cloud­Watch sched­ule + Lamb­da) for 11:58am that will hit the rel­e­vant APIs with a blast of con­cur­rent requests, enough to cause API Gate­way to spawn suf­fi­cient no. of con­cur­rent exe­cu­tions of your function(s).

You can mark these requests with a spe­cial HTTP head­er or pay­load, so that the han­dling func­tion can dis­tin­guish it from a nor­mal user requests and can short-cir­cuit with­out attempt­ing to exe­cute the nor­mal code path.

That’s great that you mit­i­gates the impact of cold starts dur­ing these pre­dictable bursts of traf­fic, but does it not betray the ethos of server­less com­pute, that you shouldn’t have to wor­ry about scal­ing?

Sure, but mak­ing users hap­py thumps every­thing else, and users are not hap­py if they have to wait for your func­tion to cold start before they can order their food, and the cost of switch­ing to a com­peti­tor is low so they might not even come back the next day.

Alter­na­tive­ly, you could con­sid­er reduc­ing the impact of cold starts by reduc­ing the length of cold starts:

  • by author­ing your Lamb­da func­tions in a lan­guage that doesn’t incur a high cold start time — i.e. Node.js, Python, or Go
  • choose a high­er mem­o­ry set­ting for func­tions on the crit­i­cal path of han­dling user requests (i.e. any­thing that the user would have to wait for a response from, includ­ing inter­me­di­ate APIs)
  • opti­miz­ing your function’s depen­den­cies, and pack­age size
  • stay as far away from VPCs as you pos­si­bly can! VPC access requires Lamb­da to cre­ate ENIs (elas­tic net­work inter­face) to the tar­get VPC and that eas­i­ly adds 10s (yeah, you’re read­ing it right) to your cold start

There are also two oth­er fac­tors to con­sid­er:

Final­ly, what about those APIs that are sel­dom used? It’s actu­al­ly quite like­ly that every time some­one hits that API they will incur a cold start, so to your users, that API is always slow and they become even less like­ly to use it in the future, which is a vicious cycle.

For these APIs, you can have a cron job that runs every 5–10 mins and pings the API (with a spe­cial ping request), so that by the time the API is used by a real user it’ll hope­ful­ly be warm and the user would not be the one to have to endure the cold start time.

This method for ping­ing API end­points to keep them warm is less effec­tive for “busy” func­tions with lots of con­cur­rent exe­cu­tions — your ping mes­sage would only reach one of the con­cur­rent exe­cu­tions and if the lev­el of con­cur­rent user requests drop then some of the con­cur­rent exe­cu­tions would be garbage col­lect­ed after some peri­od of idle­ness, and that is what you want (don’t pay for resources you don’t need).

Any­how, this post is not intend­ed to be your one-stop guide to AWS Lamb­da cold starts, but mere­ly to illus­trate that it’s a more nuanced dis­cus­sion than “just the first request”.

As much as cold starts is a char­ac­ter­is­tic of the plat­form that we just have to accept, and we love the AWS Lamb­da plat­form and want to use it as it deliv­ers on so many fronts. It’s impor­tant not to let our own pref­er­ence blind us from what’s impor­tant, which is to keep our users hap­py and build a prod­uct that they would want to keep on using.

To do that, you do need to know the plat­form you’re build­ing on top of, and with the cost of exper­i­men­ta­tion so low, there’s no good rea­son not to exper­i­ment with AWS Lamb­da your­self and try to learn more about how it behaves and how you can make the most of it.

Liked this post? Why not support me on Patreon and help me get rid of the ads!