how to do fan-out and fan-in with AWS Lambda

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

In the last post, we look at how you can implement pub-sub with AWS Lambda. We compared several event sources you can use, SNS, Kinesis streams and DynamoDB streams, and the tradeoffs available to you.

Let’s look at another messaging pattern today, push-pull, which is often referred to as fan-out/fan-in.

It’s really two separate patterns working in tandem.

Fan-out is often used on its own, where messages are delivered to a pool of workers in a round-robin fashion and each message is delivered to only one worker.

This is useful in at least two different ways:

  1. having a pool of workers to carry out the actual work allows for parallel processing and lead to increased throughput
  2. if each message represents an expensive task that can be broken down into smaller subtasks that can be carried out in parallel

In the second case where the original task (say, a batch job) is partitioned into many subtasks, you’ll need fan-in to collect result from individual workers and aggregate them together.

fan-out with SNS

As discussed above, SNS’s invocation per message policy is a good fit here as we’re optimizing for throughput and parallelism during the fan-out stage.

Here, a ventilator function would partition the expensive task into subtasks, and publish a message to the SNS topic for each subtask.

This is essentially the approach we took when we implemented the timeline feature at Yubl (the last startup I worked at) which works the same as Twitter’s timeline – when you publish a new post it is distributed to your followers’ timeline; and when you follow another user, their posts would show up in your timeline shortly after.

Yubl had a timeline feature which works the same way as Twitter’s timeline. When you publish a new post, the post will be distributed to the timeline of your followers.
A real-world example of fan-out whereby a user’s new post is distributed to his followers. Since the user can have tens of thousands of followers the task is broken down into many subtasks – each subtask involves distributing the new post to 1k followers and can be performed in parallel.

fan-out with SQS

Before the advent of AWS Lambda, this type of workload is often carried out with SQS. Unfortunately SQS is not one of the supported event sources for Lambda, which puts it in a massive disadvantage here.

That said, SQS itself is still a good choice for distributing tasks and if your subtasks take longer than 5 minutes to complete (the max execution time for Lambda) you might still have to find a way to make the SQS + Lambda setup work.

Let me explain what I mean.

First, it’s possible for a Lambda function to go beyond the 5 min execution time limit by writing it as a recursive function. However, the original invocation (triggered by SNS) has to signal whether or not the SNS message was successfully processed, but that information is only available at the end of the recursion!

With SQS, you have a message handle that can be passed along during recursion. The recursed invocation can then use the handle to:

  • extend the visibility timeout for the message so another SQS poller does not receive it whilst we’re still processing the message
  • delete the message if we’re able to successfully process it

A while back, I prototyped an architecture for processing SQS messages using recursive Lambda functions. The architecture allows for elastically scaling up and down the no. of pollers based on the size of the backlog (or whatever CloudWatch metric you choose to scale on).

You can read all about it here.

I don’t believe it lowers the bar of entry for the SQS + Lambda setup enough for regular use, not to mention the additional cost of running a Lambda function 24/7 for polling SQS.

The good news is that, AWS announced that SQS event source is coming to Lambda! So hopefully in the future you won’t need workarounds like the one I created to use Lambda with SQS.

What about Kinesis or DynamoDB Streams?

Personally I don’t feel these are great options, because the degree of parallelism is constrained by the no. of shards. Whilst you can increase the no. of shards, it’s a really expensive way to get extra parallelism, especially given the way resharding works in Kinesis Streams – after splitting an existing shard, the old shard is still around for at least 24 hours (based on your retention policy) and you’ll continue to pay for it.

Therefore, dynamically adjusting the no. of shards to scale up and down the degree of parallelism you’re after can incur lots unnecessary cost.

With DynamoDB Streams, you don’t even have the option to reshard the stream – it’s a managed stream that reshards as it sees fit.

fan-in: collecting results from workers

When the ventilator function partition the original task into many subtasks, it can also include two identifiers with each subtask?—?one for the top level job, and one for the subtask. When the subtasks are completed, you can use the identifiers to record their results against.

For example, you might use a DynamoDB table to store these results. But bare in mind that DynamoDB has a max item size of 400KB including attribute names.

Alternatively, you may also consider storing the results in S3, which has a max object size of a whopping 5TB! For example, you can store the results as the following:


Note that in both cases we’re prone to experience hot partitions – large no. of writes against the same DynamoDB hash key or S3 prefix.

To mitigate this negative effect, be sure to use a GUID for the job ID.

Depending on the volume of write operations you need to perform against S3, you might need to tweak the approach. For example:

  • partition the bucket with top level folders and place results in to the correct folder based on hash value of the job ID
  • store the results in easily hashable but unstructured way in S3, but also record references to them in DynamoDB table

fan-in: tracking overall progress

When the ventilator function runs and partitions the expensive task into lots small subtasks, it should also record the total no. of subtasks. This way, it allows each invocation of the worker function to atomically decrement the count, until it reaches 0.

The invocation that sees the count reach 0 is then responsible for signalling that all the subtasks are complete. It can do this in many ways, perhaps by publishing a message to another SNS topic so the worker function is decoupled from whatever post steps that need to happen to aggregate the individual results.

(wait, so are we back to the pub-sub pattern again?) maybe ;-)

At this point, the sink function (or reducer, as it’s called in the context of a map-reduce job) would be invoked. Seeing as you’re likely to have a large no. of results to collect, it might be a good idea to also write the sink function as a recursive function too.

Anyway, these are just a few of the ways I can think of to implement the push-poll pattern with AWS Lambda. Let me know in the comments if I have missed any obvious alternatives.

Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter

Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.

Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.

Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.

Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!

Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!