Are Lambda-to-Lambda calls really so bad?

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

If you utter the words “I call a Lambda function from another Lambda function” you might receive a bunch of raised eyebrows. It’s generally frowned up for some good reasons, but as with most things, there are nuances to this discussion.

In most cases, a Lambda function is an implementation detail and shouldn’t be exposed as the system’s API. Instead, they should be fronted with something, such as API Gateway for HTTP APIs or an SNS topic for event processing systems. This allows you to make implementation changes later without impacting the external-facing contract of your system.

Maybe you started off with a single Lambda function, but as the system grows you decide to split the logic into multiple functions. Or maybe the throughput of your system has reached such a level that it makes more economical sense to move the code into ECS/EC2 instead. These changes shouldn’t affect how your consumers interact with your system.

Update 16/07/2020

Following some lively debate on Twitter, I sat down with Michael Hart and Jeremy Daly to discuss this topic of Lambda-to-Lambda invocations in more detail. I think it’s worth putting more nuance and context into this “why you shouldn’t” section.

Firstly, my advice here is not hard and fast. There are lots of exceptions and reasons to break from the mould if you understand the tradeoffs you’re making. But the most important thing to consider is the organizational environment you operate in.

Earlier, I said that you shouldn’t call Lambda functions across service boundaries. A more accurate description of my thinking should be “across ownership lines”. That is, calling Lambda functions that are owned by other teams. The two are usually equivalent in companies where I had worked – the system is broken up into microservices, and teams own one or more microservices.

An important factor to consider is “how expensive is the communication channel between the two teams”. The bigger the company, the higher these costs tend to be. I’ve seen projects delayed for months or even years because of these cross-team dependencies. And the higher these costs are the more you need to put a stable interface between services. Having a more stable interface affords you more flexibility to make changes without breaking your contract with the other teams.

I think a Lambda function is not a stable interface because the caller needs to know its name, its region and its AWS account. Which stops me from being able to refactor my functions without forcing the callers to change, hence creating dependency on another team when I want to refactor my service.

For example:

  • renaming a function
  • splitting a fat Lambda (e.g. one that handles all CRUD actions against one entity type) into single-purpose functions
  • go multi-region, and route caller to the closest region
  • move the service to another account (e.g. if you’re migrating from a shared AWS account for all teams to a model where you have separate accounts per team)

In many of these cases, if there’s an API Gateway in front of my service, then, by and large, I can refactor my service without impacting my callers. I’m only talking about refactoring here, i.e. changes that don’t affect the contract with my callers.

If I’m making breaking contract changes then it’s unavoidable that my callers have to make changes too. I avoid these breaking changes like a plague, especially when the cost of cross-team coordination is high.

It’s also worth noting that while API Gateway can be a more stable interface for synchronous calls that cross the ownership line, the same logic can be applied to SNS/SQS/EventBridge for asynchronous calls between Lambda functions.

What if you’re operating in a small team (or on your own) and the cost of coordinating changes across teams is negligible? Then you might not need the flexibility a more stable interface can give you. In which case it’s not a problem to call a Lambda function directly across service boundaries. And what you lose in flexibility you gain in performance and cost efficiency as you cut out a layer from the execution path.

There are also other exceptions to keep in mind. For example, if you are running GraphQL in a Lambda function, then you do need to make synchronous calls to other resolver Lambda functions. Without this, you’d have to put the entire system into a very fat Lambda function! That’d likely cause far more problems for you than those synchronous Lambda calls.

I hope this clarification gives you a better understanding of when my advice above should be applied. Context is important.

p.s. Big thanks to Michael Hart and Jeremy Daly for taking the time to discuss in detail with me! The flow chart at the end of the post has also been updated to reflect the extra nuances that have been discussed here.

But what if the caller and callee functions are both inside the same service? In which case, the whole “breaking the abstraction layer” thing is not an issue. Are Lambda-to-Lambda calls OK then?

It depends. There’s still an issue of efficiency to consider.

If you’re invoking another Lambda function synchronously (i.e. when InvocationType is RequestResponse ) then you’re paying for extra invocation time and cost:

  • There is latency overhead for calling the 2nd function, especially when a cold start is involved. This impacts the end-to-end latency.
  • The 2nd Lambda invocation carries an extra cost for the invocation request.
  • Since Lambda durations are paid in 100ms blocks, so you will pay for the amount of “roll-up” time for both caller and callee functions.
  • You will pay for the idle wait time while the caller function waits for a response from the callee.

While the extra cost might be negligible in most cases, the extra latency is usually undesirable, especially for user-facing APIs.

If you care about either, then you should combine the two functions into one. You can still achieve separation of concerns and modularity at the code level. You don’t have to split them into multiple Lambda functions.

But, if the goal is to offload some work so the calling function can return earlier then I don’t see anything wrong with that. That is, assuming both functions are part of the same service.

Don’t forget to configure a DLQ or an on-failure destination for the callee function. As a rule of thumb, you should ALWAYS have a DLQ or on-failure destination for any async functions. (thanks for Sara Gerion for bringing this up)

Also, you may still put something between these two functions to regular concurrency. For example, using SQS or Kinesis to allow these tasks to be processed in batches and to smooth out traffic spikes. This is especially important when dealing with downstream systems that aren’t as scalable as Lambda. In which case, use Lambda’s scaling behaviour with other event sources to regulate the concurrency of the callee function.

So, to answer the question posed by the title of this post – it depends! And to help you decide, here’s a decision tree for when you should considering calling a Lambda function directly from another.

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter

Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.

Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.

Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.

Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!

Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!