You need to sample debug logs in production

It’s com­mon prac­tice to set log lev­el to WARNING for pro­duc­tion due to traf­fic vol­ume. This is because we have to con­sid­er var­i­ous cost fac­tors:

  • cost of log­ging : Cloud­Watch Logs charges $0.50 per GB ingest­ed. In my expe­ri­ence, this is often much high­er than the Lamb­da invo­ca­tion costs
  • cost of stor­age : Cloud­Watch Logs charges $0.03 per GB per month, and its default reten­tion pol­i­cy is Nev­er Expire! A com­mon prac­tice is to ship your logs to anoth­er log aggre­ga­tion ser­vice and to set the reten­tion pol­i­cy to X days. See this post for more details.
  • cost of pro­cess­ing : if you’re pro­cess­ing the logs with Lamb­da, then you also have to fac­tor in the cost of Lamb­da invo­ca­tions.

But, doing so leaves us with­out ANY debug logs in pro­duc­tion. When a prob­lem hap­pens in pro­duc­tion, you won’t have the debug logs to help iden­ti­fy the root cause.

Instead you have to waste pre­cious time to deploy a new ver­sion of your code to enable debug log­ging. Not to men­tion that you shouldn’t for­get to dis­able debug log­ging when you deploy the fix.

With microser­vices, you often have to do this for more than one ser­vice to get all the debug mes­sages you need.

All these, increas­es the mean time to recov­ery (MTTR) dur­ing an inci­dent. That’s not what we want.

It doesn’t have to be like that.

There is a hap­py mid­dle ground between hav­ing no debug logs and hav­ing all the debug logs. Instead, we should sam­ple debug logs from a small per­cent­age of invo­ca­tions.


I demoed how to do this in the Log­ging chap­ter of my video course Pro­duc­tion-Ready Server­less. You need two basic things:

  • a log­ger that lets you to change the log­ging lev­el dynam­i­cal­ly, e.g. via envi­ron­ment vari­ables.
  • a mid­dle­ware engine such as mid­dy

With Lamb­da, I don’t need most of the fea­tures from a ful­ly-fledged log­ger such as pino. Instead, I pre­fer to use a sim­ple log­ger mod­ule like this one. It’s writ­ten in a hand­ful of lines and gives me the basics:

  • struc­tured log­ging with JSON
  • abil­i­ty to log at dif­fer­ent lev­els
  • abil­i­ty to con­trol the log lev­el dynam­i­cal­ly via envi­ron­ment vari­ables

Using mid­dy, I can cre­ate a mid­dle­ware to dynam­i­cal­ly update the log lev­el to DEBUG. It does this for a con­fig­urable per­cent­age of invo­ca­tions. At the end of the invo­ca­tion the mid­dle­ware would restore the pre­vi­ous log lev­el.

You might notice that we also have some spe­cial han­dling for when the invo­ca­tion errs.

This is to ensure we cap­ture the error with as much con­text as pos­si­ble, includ­ing:

Sample debug logs on entire call chains

Hav­ing debug logs for a small per­cent­age of invo­ca­tion is great. But when you’re deal­ing with microser­vices you need to make sure that your debug logs cov­er an entire call chain.

That is the only way to put togeth­er a com­plete pic­ture of every­thing that hap­pened on that call chain. Oth­er­wise, you will end up with frag­ments of debug logs from many call chains but nev­er the com­plete pic­ture of one.

You can do this by for­ward­ing the deci­sion to turn on debug log­ging as a cor­re­la­tion ID. The next func­tion in the chain would respect this deci­sion, and pass it on. See this post for more detail.

So that’s it, anoth­er pro tip on how to build observ­abil­i­ty into your server­less appli­ca­tion. If you want to learn more about how to go all in with server­less, check out my 10-step guide here.

Until next time!

Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).