This is a lesson that I wished I learnt when I first started using AWS Lambda in anger, it would have made my life simpler right from the start.
But, we did get there before long, and it allowed us to track and include correlation IDs in our log messages (which are then pushed to an ELK stack)which would also include other useful information such as:
- age of the function execution
- whether invocation was a cold start
- log level
Looking back, I fell into the common trap of forgetting the practices that had served us well in a different paradigm of developing software, at least until I figured out how to adopt these old practices for the new serverless paradigm.
In this case, we know structured logs are important and why we need them, and also how to do it well. But like many others, we started with
console.log because it was simple and it worked (to a limited degree).
But, this approach has a really low ceiling:
- you can’t add contextual information with the log message, at least not in a way that’s consistent (you may change the message you log, but the contextual information should always be there, e.g. user-id, request-id, etc.) and easy to extract for an automated process
- as a result, it’s also hard to filter the log messages by specific attributes – e.g. “show me all the log messages related to this request ID”
- it’s hard to control what level to log at (i.e. debug, info, warning, …) based on configuration – e.g. log at debug level for non-production environments, but log at info/warning level for production
Which is why, if you’re just starting your Serverless journey, then learn from my mistakes and write your logs as structured JSON from the start. Also, you should use whatever log client that you were using before – log4j, nlog, loggly, log4net, whatever it is – and configure the client to format log messages as JSON and attach as much contextual information as you can.
As I mentioned in the post on how to capture and forward collection IDs, it’s also a good idea to enable debug logging on the entire call chain for a small % of requests in production. It helps you catch pervasive bugs in your logic (that are easy to catch, but ONLY if you have the right info from the logs) that would otherwise require you to redeploy all the functions on the entire call chain to turn on debug logging…
So there, a simple and effective thing to do to massively upgrade your serverless architecture. Check out my mini-series on logging for AWS Lambda which covers log aggregation, tracking correlation IDs, and some tips and tricks such as why and how to send custom metrics asynchronously.
I specialise in rapidly transitioning teams to serverless and building production-ready services on AWS.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Check out my new course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. Including basic concepts, HTTP and event triggers, activities, callbacks, nested workflows, design patterns and best practices.
Come learn about operational BEST PRACTICES for AWS Lambda: CI/CD, testing & debugging functions locally, logging, monitoring, distributed tracing, canary deployments, config management, authentication & authorization, VPC, security, error handling, and more.
You can also get 40% off the face price with the code ytcui.
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are thinking about serverless costs all wrong
- Many faced threats to Serverless security
- We can do better than percentile latencies
- I’m afraid you’re thinking about AWS Lambda cold starts all wrong
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong