Check out my new course Learn you some Lambda best practice for great good! and learn the best practices for performance, cost, security, resilience, observability and scalability.
I recently watched an excellent talk by Matt Lavin on optimization tips for Lambda and saw a slide on making DynamoDB use HTTP keep-alive. It reminded me of a conversation I had with Sebastian Cohnen, so I set out to test the effect this simple optimization has.
What is it all about?
As it turns out, Node.js’s default HTTP agent doesn’t use keep-alive and therefore every request would incur the cost of setting up a new TCP connection. This is clearly inefficient, as you need to perform a three-way handshake to establish a TCP connection. For operations that are short-lived (such as DynamoDB operations, which typically complete within a single digit ms) the latency overhead of establishing the TCP connection might be greater than the operation itself.
With the Node.js AWS SDK, you can override the HTTP agent to use for ALL clients with just a few lines of code. You can also override the settings for individual clients too.
UPDATE 22/09/2019: big thanks to Joe Bowbeer for mentioning this in the comments. Since AWS SDK v2.463.0 you no longer need these couple of lines of code change any more. Instead, set the environment variable AWS_NODEJS_CONNECTION_REUSE_ENABLED to 1 to make the SDK reuse connections by default.
To test the effect of enabling HTTP keep-alive, I setup a simple Lambda function behind API Gateway. Essentially this function puts an item into a DynamoDB Table, and that’s it.
For this experiment, I wanted to see how well the HTTP keep-alive fared across multiple invocations and how much of a difference do we see with this simple change.
Without HTTP keep-alive, the DynamoDB operation averaged around 33ms.
With HTTP keep-alive, that average drops to around 10ms.
As we suspected, the overhead (33ms-10ms = 23ms) was greater than the cost of the operation itself. The experiment shows that the connection is reused across multiple invocations just fine. With a very simple change, we were able to improve execution time by ~20ms, or to put it more impressively, reduce response time by 70%. That’s good return on investment in my book, but the difference is still not noticeable to the human eye.
But what if we scale this to 10 sequential DynamoDB operations in a single function?
With HTTP keep-alive, the function’s execution time averages around 60ms.
Without HTTP keep-alive, the average execution time rises to 180ms.
As I curl the endpoint, the difference of 120ms is definitely noticeable. This difference can start to impact user experience, and as Amazon found 10 years ago, adding 100ms of latency can reduce sales by as much as 1%.
I specialise in rapidly transitioning teams to serverless and building production-ready services on AWS.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. Enrol now and enjoy a special preorder price of £9.99 (~$13).
Are you working with Serverless and looking for expert training to level-up your skills? Or are you looking for a solid foundation to start from? Look no further, register for my Production-Ready Serverless workshop to learn how to build production-grade Serverless applications!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are thinking about serverless costs all wrong
- Many faced threats to Serverless security
- We can do better than percentile latencies
- I’m afraid you’re thinking about AWS Lambda cold starts all wrong
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong