Auto-scaling Kinesis streams with AWS Lambda

Following on from the last post where we discussed 3 useful tips for working effectively with Lambda and Kinesis, let’s look at how you can use Lambda to help you auto scale Kinesis streams.

Auto-scaling for DynamoDB and Kinesis are two of the most frequently requested features for AWS, as I write this post I’m sure the folks at AWS are working hard to make them happen. Until then, here’s how you can roll a cost effective solution yourself.

From a high level, we want to:

  • scale up Kinesis streams quickly to meet increases in load
  • scale down under-utilised Kinesis streams to save cost

Scaling Up

Reaction time is important for scaling up, and from personal experience I find polling CloudWatch metrics to be a poor solution because:

  • CloudWatch metrics are usually over a minute behind
  • depending on polling frequency, your reaction time is even further behind
  • high polling frequency has a small cost impact

sidebar: I briefly experimented with Kinesis scaling utility from AWS Labs before deciding to implement our own solution. I found that it doesn’t scale up fast enough because it uses this polling approach, and I had experienced similar issues around reaction time with dynamic-dynamodb too.

Instead, prefer a push-based approach using CloudWatch Alarms.

Whilst CloudWatch Alarms is not available as trigger to Lambda functions, you can use SNS as a proxy:

  1. add a SNS topic as notification target for CloudWatch Alarm
  2. add the SNS topic as trigger to a Lambda function to scale up the stream that has tripped the alarm

WHAT metrics?

You can use a number of metrics for triggering the scaling action, here are a few to consider.

WriteProvisionedThroughputExceeded (stream)

The simplest way is to scale up as soon as you’re throttled. With a stream-level metric you only need to set up the alarm once per stream and wouldn’t need to adjust the threshold value after each scaling action.

However, since you’re reusing the same CloudWatch Alarm you must remember to set its status to OK after scaling up.

IncomingBytes and/or IncomingRecords (stream)

You can scale up preemtively (before you’re actually throttled by the service) by calculating the provisioned throughput and then setting the alarm threshold to be, say 80% of the provisioned throughput. After all, this is exactly what we’d do for scaling EC2 clusters and the same principle applies here – why wait till you’re impacted by load when you can scale up just ahead of time?

However, we need to manage some additional complexities EC2 auto scaling service usually takes care of for us:

  • if we alarm on both IncomingBytes and IncomingRecords then it’s possible to overscale (impacts cost) if both triggers around the same time; this can be mitigated but it’s down to us to ensure only one scaling action can occur at once and that there’s a cooldown after each scaling activity
  • after each scaling activity, we need to recalculate the provisioned throughput and update the alarm threshold(s)

WriteProvisionedThroughputExceeded (shard)

IncomingBytes and/or IncomingRecords (shard)

With shard level metrics you get the benefit of knowing the shard ID (in the SNS message) so you can be more precise when scaling up by splitting specific shard(s). The downside is that you have to add or remove CloudWatch Alarms after each scaling action.

HOW to scale up

To actually scale up a Kinesis stream, you’ll need to increase the no. of active shards by splitting one of more of the existing shards. One thing to keep in mind is that once a shard is split into 2, it’s no longer ACTIVE but it will still be accessible for up to 7 days (depending on your retention policy setting) and you’ll still pay for it the whole time!

Broadly speaking, you have two options available to you:

  1. use UpdateShardCount and let Kinesis figure out how to do it
  2. choose one or more shards and split them yourself using SplitShard

Option 1 is far simpler but comes with some heavy baggage:

  • because it only supports UNIFORM_SCALING (at the time of writing) it means this action can result in many temporary shards being created unless you double up each time (remember, you’ll pay for all those temporary shards for up to 7 days)
  • doubling up can be really expensive at scale (and possibly unnecessary depending on load pattern)
  • plus all the other limitations

As for Option 2, if you’re using shard level metrics then you can split only the shards that have triggered the alarm(s). Otherwise, a simple strategy would be to sort the shards by their hash range and split the biggest shards first.

Scaling Down

To scale down a Kinesis stream you merge two adjacent shards. Just as splitting a shard leaves behind an inactive shard that you’ll still pay for, merging shards will leave behind two inactive shards!

Since scaling down is primarily a cost saving exercise, I strongly recommend that you don’t scale down too often as you could easily end up increasing your cost instead if you have to scale up soon after scaling down (hence leaving behind lots inactive shards).

Since we want to scale down infrequently, it makes more sense to do so with a cron job (ie. CloudWatch Event + Lmabda) than to use CloudWatch Alarms. As an example, after some trial and error we settled on scaling down once every 36 hours, which is 1.5x our retention policy of 24 hours.

WHICH stream

When the cron job runs, our Lambda function would iterate through all the Kinesis streams and for each stream:

  • calculate its provisioned throughput in terms of both bytes/s and records/s
  • get 5 min metrics (IncomingBytes and IncomingRecords) over the last 24 hours
  • if all the data points over the last 24 hours are below 50% of the provisioned throughput then scale down the stream

The reason we went with 5 min metrics is because that’s the granularity the Kinesis dashboard uses and allows me to validate my calculations (you don’t get bytes/s and records/s values from CloudWatch directly, but will need to calculate them yourself).

Also, we require all datapoints over the last 24 hours to be below the 50% threshold to be absolutely sure that utilization level is consistently below the threshold rather than a temporary blip (which could be a result of an outage for example).

HOW to scale down

We have the same trade-offs between using UpdateShardCount and doing-it-yourself with MergeShards as scaling up.

Wrapping Up

To set up the initial CloudWatch Alarms for a stream, we have a repo which hosts the configurations for all of our Kinsis streams, as well as a script for creating any missing streams and associated CloudWatch Alarms (using CloudFormation templates).

Additionally, as you can see from the screenshot above, the configuration file also specifies the min and max no. of shards for each Kinesis stream. When the create-streams script creates a new stream, it’ll be created with the specified desiredShards no. of shards.


Hope you enjoyed this post, please let me know in the comments below if you are doing something similar to auto-scale your Kinesis streams and if you have any experience you’d like to share.



AWS Lambda —3 pro tips for working with Kinesis streams

At Yubl, we arrived at a non-trivial serverless architecture where Lambda and Kinesis became a prominent feature of this architecture.

Whilst our experience using Lambda with Kinesis was great in general, there was a couple of lessons that we had to learn along the way. Here are 3 useful tips to help you avoid some of the pitfalls we fell into and accelerate your own adoption of Lambda and Kinesis.

Consider partial failures

From the Lambda documentation:

AWS Lambda polls your stream and invokes your Lambda function. Therefore, if a Lambda function fails, AWS Lambda attempts to process the erring batch of records until the time the data expires…

Because the way Lambda functions are retried, if you allow your function to err on partial failures then the default behavior is to retry the entire batch until success or the data expires from the stream.

To decide if this default behavior is right for you, you have to answer certain questions:

  • can events be processed more than once?
  • what if those partial failures are persistent? (perhaps due to a bug in the business logic that is not handling certain edge cases gracefully)
  • is it more important to process every event till success than keeping the overall system real-time?

In the case of Yubl (which was a social networking app with a timeline feature similar to Twitter) we found that for most of our use cases it’s more important to keep the system flowing than to halt processing for any failed events, even if for a minute.

For instance, when you create a new post, we would distribute it to all of your followers by processing the yubl-posted event. The 2 basic choices we’re presented with are:

  1. allow errors to bubble up and fail the invocation—we give every event every opportunity to be processed; but if some events fail persistently then no one will receive new posts in their feed and the system appears unavailable
  2. catch and swallow partial failures—failed events are discarded, some users will miss some posts but the system appears to be running normally to users (even affected users might not realize that they had missed some posts)

(of course, it doesn’t have to be a binary choice, there’s plenty of room to add smarter handling for partial failures, which we will discuss shortly)

We encapsulated these 2 choices as part of our tooling so that we get the benefit of reusability and the developers can make an explicit choice (and the code makes that choice obvious to anyone reading it later on) for every Kinesis processor they create.

You would probably apply different choices depending on the problem you’re solving, the important thing is to always consider how partial failures would affect your system as a whole.

Use dead letter queues (DLQ)

AWS announced support for Dead Letter Queues (DLQ) at the end of 2016, however, at the time of writing this support only extends to asynchronous invocations (SNS, S3, IOT, etc.) but not poll-based invocations such as Kinesis and DynamoDB streams.

That said, there’s nothing stopping you from applying the DLQ concept yourself.

First, let’s roll back the clock to a time when we didn’t have Lambda. Back then, we’d use long running applications to poll Kinesis streams ourselves. Heck, I even wrote my own producer and consumer libraries because when AWS rolled out Kinesis they totally ignored anyone not running on the JVM!

Lambda has taken over a lot of the responsibilities—polling, tracking where you are in the stream, error handling, etc.—but as we have discussed above it doesn’t remove you from the need to think for yourself. Nor does it change what good looks like for a system that processes Kinesis events, which for me must have at least these 3 qualities:

  • it should be real-time (most domains consider real-time as “within a few seconds”)
  • it should retry failed events, but retries should not violate the realtime constraint on the system
  • it should be possible to retrieve events that could not be processed so someone can investigate root cause or provide manual intervention

Back then, my long running application would:

  1. poll Kinesis for events
  2. process the events by passing them to a delegate function (your code)
  3. failed events are retried 2 additional times
  4. after the 2 retries are exhausted, they are saved into a SQS queue
  5. record the last sequence number of the batch so that we don’t lose the current progress if the host VM dies or the application crashes
  6. another long running application (perhaps on another VM) would poll the SQS queue for events that couldn’t be process realtime
  7. process the failed events by passing them to the same delegate function as above (your code)
  8. after the max no. of retrievals the events are passed off to a DLQ
  9. this triggers CloudWatch alarms and someone can manually retrieve the event from the DLQ to investigate

A Lambda function that processes Kinesis events should also:

  • retry failed events X times depending on processing time
  • send failed events to a DLQ after exhausting X retries

Since SNS already comes with DLQ support, you can simplify your setup by sending the failed events to a SNS topic instead—Lambda would then process it a further 3 times before passing it off to the designated DLQ.

Avoid “hot” streams

We found that when a Kinesis stream has 5 or more Lambda function subscribers we would start to see lots ReadProvisionedThroughputExceeded errors in CloudWatch. Fortunately these errors are silent to us as they happen to (and are handled by) the Lambda service polling the stream.

However, we occasionally see spikes in the GetRecords.IteratorAge metric, which tells us that a Lambda function will sometimes lag behind. This did not happen frequently enough to present a problem but the spikes were unpredictable and did not correlate to spikes in traffic or number of incoming Kinesis events.

Increasing the no. of shards in the stream made matters worse and the no. of ReadProvisionedThroughputExceeded increased proportionally.

According to the Kinesis documentation

Each shard can support up to 5 transactions per second for reads, up to a maximum total data reads of 2 MB per second.

and Lambda documentation

If your stream has 100 active shards, there will be 100 Lambda functions running concurrently. Then, each Lambda function processes events on a shard in the order that they arrive.

One would assume that each of the aforementioned Lambda functions would be polling its shard independently. Since the problem is having too many Lambda functions poll the same shard, it makes sense that adding new shards will only escalate the problem further.

“All problems in computer science can be solved by another level of indirection.”

—David Wheeler

After speaking to the AWS support team about this, the only advice we received (and one that we had already considered) was to apply the fan out pattern by adding another layer of Lambda function who would distribute the Kinesis events to others.

Whilst this is simple to implement, it has some downsides:

  • it vastly complicates the logic for handling partial failures (see above)
  • all functions now process events at the rate of the slowest function, potentially damaging the realtime-ness of the system

We also considered and discounted several other alternatives, including:

  • have one stream per subscriber—this has a significant cost implication, and more importantly it means publishers would need to publish the same event to multiple Kinesis streams in a “transaction” with no easy way to rollback (since you can’t unpublish an event in Kinesis) on partial failures
  • roll multiple subscriber logic into one—this corrodes our service boundary as different subsystems are bundled together to artificially reduce the no. of subscribers

In the end, we didn’t find a truly satisfying solution and decided to reconsider if Kinesis was the right choice for our Lambda functions on a case by case basis.

  • for subsystems that do not have to be realtime, use S3 as source instead—all our Kinesis events are persisted to S3 via Kinesis Firehose, the resulting S3 files can then be processed by these subsystems, eg. Lambda functions that stream events to Google BigQuery for BI
  • for work that are task-based (ie, order is not important), use SNS/SQS as source instead—SNS is natively supported by Lambda, and we implemented a proof-of-concept architecture for processing SQS events with recursive Lambda functions, with elastic scaling; now that SNS has DLQ support (it was not available at the time) it would definitely be the preferred option provided that its degree of parallelism would not flood and overwhelm downstream systems such as databases, etc.
  • for everything else, continue to use Kinesis and apply the fan out pattern as an absolute last resort

Wrapping up…

So there you have it, 3 pro tips from a group of developers who have had the pleasure of working extensively with Lambda and Kinesis.

I hope you find this post useful, if you have any interesting observations or learning from your own experience working with Lambda and Kinesis, please share them in the comments section below.


Yubl’s road to serverless — Part 1, Overview

Yubl’s road to serverless — Part 2, Testing and CI/CD

Yubl’s road to serverless — Part 3, Ops

AWS Lambda — use recursive functions to process SQS messages, Part 1

AWS Lambda — use recursive functions to process SQS messages, Part 2

Slides and recording of my Lambda talk at LeetSpeak 2016

New releases – DynamoDB.SQL and Darkseid

Hi, just a quick update on two of my libraries aimed at making AWS easier to work with from .Net.



DynamoDB.SQL is a SQL-like external DSL for querying & scanning data in Amazon DynamoDB. Version 3.0.0 has been released, which moves away from the monolithic .Net AWSSDK (v2.x.x), and onto the DynamoDB specific package.

You can continue to use v2.x packages for DynamoDB.SQL, I’ll apply any bug fixes to both v2.x and v3.x packages. However, any new features in the future – such as support for the mid-level Table abstractions in the AWSSDK – will be added to v3.x only.

Over the coming weeks and months, I’ll continue the effort to migrate my fleet of AWS-related tools and libraries to the service-specific packages.



Darkseid is a producer library for Amazon Kinesis, it works hand-in-hand with ReactoKinesix which provides the consumer side of the story.

Courtesy of Dustin’s PR, version 0.3.0 has been released which adds synchronous methods for pushing events to Kinesis (so that services that aren’t ready to go async all the way can still integrate with Kinesis using Darkseid).


That’s it, folks, hope you all had a nice weekend. If you haven’t seen Deadpool yet, you should, it is amazing 

Introducing log4net.Kinesis, a log4net appender for Amazon Kinesis

Just under three weeks ago, Amazon announced the public availability of their new Kinesis service, a service which is designed to allow real-time processing of streaming big data.

As an experiment I have put together a simple, actor-based customer appender for log4net which allows you to publish your log messages into a configured Kinesis stream. You can then have another cluster of machines to fetch the data from the stream and do whatever processing or aggregation you like to do.

You can download and install the appender from Nuget here or checkout the source code here.

The implementation is done in F# in 100 lines of code, and as you can see is very simple, easy to reason with, fully asynchronous and thread-safe.


Once you have pushed your log messages into the stream, you’ll need to use the AWSSDK to fetch the data and process them. For Java, there’s a client application which takes care of most of the heavy lifting – e.g. tracking your progress, handling failovers and load balancing. Unfortunately, at the time of writing, there’s no equivalent of such client application in the current version of the .Net AWSSDK.

So to help make it easier for us .Net folks to build real-time data processing applications on top of Amazon Kinesis, I had started a Rx-based .Net client library called ReactoKinesiX (I really wanted to get RX into the name!), more details to follow.


I think the introduction of Kinesis is very exciting and opens up many possibilities, and at the current pricing model it also represents a very cost effective alternative to some of the other competing and more polished services out there.