How to log timed out Lambda invocations

Yan Cui

I help clients go faster for less using serverless technologies.

Imagine the situation. It’s 3 AM, and PagerDuty alerts you that the number of 5XX errors has spiked. Something’s wrong.

Strange, there are no error logs…

Fortunately, we have enabled X-Ray tracing and I’m able to quickly find one of the traces for these 5XX errors.

Sadly, the traces don’t offer any explanation for the error.

After fumbling around aimlessly for a while for clues, a moment of genius struck. I just so happens to notice the function’s timeout value is set to 1 second. The Duration from the trace was 1.3 seconds.

Aha, the Lambda function probably timed out! Now that I have a suspected cause, I’m able to get the confirmation by looking for "Task timed out" messages in the logs. And sure enough, I’m able to find them.

And now I can start the real investigation into what happened and find out what caused the function to time out.

This would have been much easier if only I had an error log message that informs me about these time outs!

If all this sounded awful, then you’re in luck. Today, we’re gonna see how we can log an error message when a Lambda function times out.

Pre-emptively log timeout messages

Hold on to your hats, because the solution is mind-bogglingly simple! All you have to do is to log the error message just before the invocation times out ;-)

The context object for a Node.js function has a very useful method called getRemainingTimeInMillis. It returns the number of milliseconds left in the current invocation. So, we can schedule a callback to be actioned JUST before the function times out and preemptively logs the timeout event.

Sure, you might get the odd false positives in the logs. But that is still a lot better than not having any error logs when the function does time out!

This is a cross-cutting concern that should be encapsulated into a middleware. If you’re using Node.js then you should consider using the middy middleware engine. Here, I can write a very simple middleware to log timeout errors:

const Promise = require('bluebird')
const Log = require('@perform/lambda-powertools-logger')

module.exports = () => {
  let isTimedOut = undefined
  let promise = undefined

  const resetPromise = () => {
    if (promise) {
      promise.cancel()
      promise = undefined
    }
  }
  
  return {
    before: (handler, next) => {
      const timeLeft = handler.context.getRemainingTimeInMillis()
      handler.context.callbackWaitsForEmptyEventLoop = false
      isTimedOut = undefined

      promise = Promise.delay(timeLeft - 10).then(() => {
        if (isTimedOut !== false) {
          const awsRequestId = handler.context.awsRequestId
          const invocationEvent = JSON.stringify(handler.event)
          Log.error('invocation timed out', { awsRequestId, invocationEvent })
        }
      })
  
      next()
    },
    after: (handler, next) => {
      isTimedOut = false
      resetPromise()
      next()
    },
    onError: (handler, next) => {
      isTimedOut = false
      resetPromise()
      next(handler.error)
    }
  }
}

Couple of things to note about this code:

  • ln21: we schedule a callback to be executed 10ms before the function times out.
  • ln33/ln38: if the function completes before the scheduled callback runs then we will cancel the callback using bluebird’s Promise.cancel.
  • ln18: this is important. If you don’t set callbackWaitsForEmptyEventLoop to false then the invocation won’t complete until the callback has executed. That would make every invocation last longer!
  • ln24: capture the invocation event when the function times out. This is useful for replaying the invocation locally to debug the problem.
  • ln2: we are using the logger from dazn-lambda-powertools so it’ll automatically populate any captured correlation IDs in the logs. For more details, check the README in the repo.

If you want to see this middleware in action then take a look at this demo project. Once deployed, I can curl the HTTP endpoint and see an error message for timeouts.

So that’s it! The approach is really simple.

To make it really easy for you to apply this to your projects, we have added this middleware to dazn-lambda-powertools so you will be able to use it via NPM.


 

Whenever you’re ready, here are 4 ways I can help you:

  1. If you want a one-stop shop to help you quickly level up your serverless skills, you should check out my Production-Ready Serverless workshop. Over 20 AWS Heroes & Community Builders have passed through this workshop, plus 1000+ students from the likes of AWS, LEGO, Booking, HBO and Siemens.
  2. If you want to learn how to test serverless applications without all the pain and hassle, you should check out my latest course, Testing Serverless Architectures.
  3. If you’re a manager or founder and want to help your team move faster and build better software, then check out my consulting services.
  4. If you just want to hang out, talk serverless, or ask for help, then you should join my FREE Community.