You need to sample debug logs in production

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

It’s common practice to set log level to WARNING for production due to traffic volume. This is because we have to consider various cost factors:

  • cost of logging : CloudWatch Logs charges $0.50 per GB ingested. In my experience, this is often much higher than the Lambda invocation costs
  • cost of storage : CloudWatch Logs charges $0.03 per GB per month, and its default retention policy is Never Expire! A common practice is to ship your logs to another log aggregation service and to set the retention policy to X days. See this post for more details.
  • cost of processing : if you’re processing the logs with Lambda, then you also have to factor in the cost of Lambda invocations.

But, doing so leaves us without ANY debug logs in production. When a problem happens in production, you won’t have the debug logs to help identify the root cause.

Instead you have to waste precious time to deploy a new version of your code to enable debug logging. Not to mention that you shouldn’t forget to disable debug logging when you deploy the fix.

With microservices, you often have to do this for more than one service to get all the debug messages you need.

All these, increases the mean time to recovery (MTTR) during an incident. That’s not what we want.

It doesn’t have to be like that.

There is a happy middle ground between having no debug logs and having all the debug logs. Instead, we should sample debug logs from a small percentage of invocations.

How?

I demoed how to do this in the Logging chapter of my video course Production-Ready Serverless. You need two basic things:

  • a logger that lets you to change the logging level dynamically, e.g. via environment variables.
  • a middleware engine such as middy

With Lambda, I don’t need most of the features from a fully-fledged logger such as pino. Instead, I prefer to use a simple logger module like this one. It’s written in a handful of lines and gives me the basics:

Using middy, I can create a middleware to dynamically update the log level to DEBUG. It does this for a configurable percentage of invocations. At the end of the invocation the middleware would restore the previous log level.

You might notice that we also have some special handling for when the invocation errs.

This is to ensure we capture the error with as much context as possible, including:

Sample debug logs on entire call chains

Having debug logs for a small percentage of invocation is great. But when you’re dealing with microservices you need to make sure that your debug logs cover an entire call chain.

That is the only way to put together a complete picture of everything that happened on that call chain. Otherwise, you will end up with fragments of debug logs from many call chains but never the complete picture of one.

You can do this by forwarding the decision to turn on debug logging as a correlation ID. The next function in the chain would respect this decision, and pass it on. See this post for more detail.

So that’s it, another pro tip on how to build observability into your serverless application. If you want to learn more about how to go all in with serverless, check out my 10-step guide here.

Until next time!

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter


Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.


Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.


Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.


Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!


Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!