Auto-scaling Kinesis streams with AWS Lambda

Fol­low­ing on from the last post where we dis­cussed 3 use­ful tips for work­ing effec­tive­ly with Lamb­da and Kine­sis, let’s look at how you can use Lamb­da to help you auto scale Kine­sis streams.

Auto-scal­ing for DynamoDB and Kine­sis are two of the most fre­quent­ly request­ed fea­tures for AWS, as I write this post I’m sure the folks at AWS are work­ing hard to make them hap­pen. Until then, here’s how you can roll a cost effec­tive solu­tion your­self.

From a high lev­el, we want to:

  • scale up Kine­sis streams quick­ly to meet increas­es in load
  • scale down under-utilised Kine­sis streams to save cost

Scaling Up

Reac­tion time is impor­tant for scal­ing up, and from per­son­al expe­ri­ence I find polling Cloud­Watch met­rics to be a poor solu­tion because:

  • Cloud­Watch met­rics are usu­al­ly over a minute behind
  • depend­ing on polling fre­quen­cy, your reac­tion time is even fur­ther behind
  • high polling fre­quen­cy has a small cost impact

side­bar: I briefly exper­i­ment­ed with Kine­sis scal­ing util­i­ty from AWS Labs before decid­ing to imple­ment our own solu­tion. I found that it doesn’t scale up fast enough because it uses this polling approach, and I had expe­ri­enced sim­i­lar issues around reac­tion time with dynam­ic-dynamodb too.

Instead, pre­fer a push-based approach using Cloud­Watch Alarms.

Whilst Cloud­Watch Alarms is not avail­able as trig­ger to Lamb­da func­tions, you can use SNS as a proxy:

  1. add a SNS top­ic as noti­fi­ca­tion tar­get for Cloud­Watch Alarm
  2. add the SNS top­ic as trig­ger to a Lamb­da func­tion to scale up the stream that has tripped the alarm

WHAT metrics?

You can use a num­ber of met­rics for trig­ger­ing the scal­ing action, here are a few to con­sid­er.

WriteProvisionedThroughputExceeded (stream)

The sim­plest way is to scale up as soon as you’re throt­tled. With a stream-lev­el met­ric you only need to set up the alarm once per stream and wouldn’t need to adjust the thresh­old val­ue after each scal­ing action.

How­ev­er, since you’re reusing the same Cloud­Watch Alarm you must remem­ber to set its sta­tus to OK after scal­ing up.

IncomingBytes and/or IncomingRecords (stream)

You can scale up pre­em­tive­ly (before you’re actu­al­ly throt­tled by the ser­vice) by cal­cu­lat­ing the pro­vi­sioned through­put and then set­ting the alarm thresh­old to be, say 80% of the pro­vi­sioned through­put. After all, this is exact­ly what we’d do for scal­ing EC2 clus­ters and the same prin­ci­ple applies here — why wait till you’re impact­ed by load when you can scale up just ahead of time?

How­ev­er, we need to man­age some addi­tion­al com­plex­i­ties EC2 auto scal­ing ser­vice usu­al­ly takes care of for us:

  • if we alarm on both IncomingBytes and IncomingRecords then it’s pos­si­ble to over­scale (impacts cost) if both trig­gers around the same time; this can be mit­i­gat­ed but it’s down to us to ensure only one scal­ing action can occur at once and that there’s a cooldown after each scal­ing activ­i­ty
  • after each scal­ing activ­i­ty, we need to recal­cu­late the pro­vi­sioned through­put and update the alarm threshold(s)

WriteProvisionedThroughputExceeded (shard)

IncomingBytes and/or IncomingRecords (shard)

With shard lev­el met­rics you get the ben­e­fit of know­ing the shard ID (in the SNS mes­sage) so you can be more pre­cise when scal­ing up by split­ting spe­cif­ic shard(s). The down­side is that you have to add or remove Cloud­Watch Alarms after each scal­ing action.

HOW to scale up

To actu­al­ly scale up a Kine­sis stream, you’ll need to increase the no. of active shards by split­ting one of more of the exist­ing shards. One thing to keep in mind is that once a shard is split into 2, it’s no longer ACTIVE but it will still be acces­si­ble for up to 7 days (depend­ing on your reten­tion pol­i­cy set­ting) and you’ll still pay for it the whole time!

Broad­ly speak­ing, you have two options avail­able to you:

  1. use Update­Shard­Count and let Kine­sis fig­ure out how to do it
  2. choose one or more shards and split them your­self using Split­Shard

Option 1 is far sim­pler but comes with some heavy bag­gage:

  • because it only sup­ports UNIFORM_SCALING (at the time of writ­ing) it means this action can result in many tem­po­rary shards being cre­at­ed unless you dou­ble up each time (remem­ber, you’ll pay for all those tem­po­rary shards for up to 7 days)
  • dou­bling up can be real­ly expen­sive at scale (and pos­si­bly unnec­es­sary depend­ing on load pat­tern)
  • plus all the oth­er lim­i­ta­tions

As for Option 2, if you’re using shard lev­el met­rics then you can split only the shards that have trig­gered the alarm(s). Oth­er­wise, a sim­ple strat­e­gy would be to sort the shards by their hash range and split the biggest shards first.

Scaling Down

To scale down a Kine­sis stream you merge two adja­cent shards. Just as split­ting a shard leaves behind an inac­tive shard that you’ll still pay for, merg­ing shards will leave behind two inac­tive shards!

Since scal­ing down is pri­mar­i­ly a cost sav­ing exer­cise, I strong­ly rec­om­mend that you don’t scale down too often as you could eas­i­ly end up increas­ing your cost instead if you have to scale up soon after scal­ing down (hence leav­ing behind lots inac­tive shards).

Since we want to scale down infre­quent­ly, it makes more sense to do so with a cron job (ie. Cloud­Watch Event + Lmab­da) than to use Cloud­Watch Alarms. As an exam­ple, after some tri­al and error we set­tled on scal­ing down once every 36 hours, which is 1.5x our reten­tion pol­i­cy of 24 hours.

WHICH stream

When the cron job runs, our Lamb­da func­tion would iter­ate through all the Kine­sis streams and for each stream:

  • cal­cu­late its pro­vi­sioned through­put in terms of both bytes/s and records/s
  • get 5 min met­rics (IncomingBytes and IncomingRecords) over the last 24 hours
  • if all the data points over the last 24 hours are below 50% of the pro­vi­sioned through­put then scale down the stream

The rea­son we went with 5 min met­rics is because that’s the gran­u­lar­i­ty the Kine­sis dash­board uses and allows me to val­i­date my cal­cu­la­tions (you don’t get bytes/s and records/s val­ues from Cloud­Watch direct­ly, but will need to cal­cu­late them your­self).

Also, we require all dat­a­points over the last 24 hours to be below the 50% thresh­old to be absolute­ly sure that uti­liza­tion lev­el is con­sis­tent­ly below the thresh­old rather than a tem­po­rary blip (which could be a result of an out­age for exam­ple).

HOW to scale down

We have the same trade-offs between using Update­Shard­Count and doing-it-your­self with Merge­Shards as scal­ing up.

Wrapping Up

To set up the ini­tial Cloud­Watch Alarms for a stream, we have a repo which hosts the con­fig­u­ra­tions for all of our Kin­sis streams, as well as a script for cre­at­ing any miss­ing streams and asso­ci­at­ed Cloud­Watch Alarms (using Cloud­For­ma­tion tem­plates).

Addi­tion­al­ly, as you can see from the screen­shot above, the con­fig­u­ra­tion file also spec­i­fies the min and max no. of shards for each Kine­sis stream. When the create-streams script cre­ates a new stream, it’ll be cre­at­ed with the spec­i­fied desiredShards no. of shards.


Hope you enjoyed this post, please let me know in the com­ments below if you are doing some­thing sim­i­lar to auto-scale your Kine­sis streams and if you have any expe­ri­ence you’d like to share.



Like what you’re read­ing? Check out my video course Pro­duc­tion-Ready Server­less and learn the essen­tials of how to run a server­less appli­ca­tion in pro­duc­tion.

We will cov­er top­ics includ­ing:

  • authen­ti­ca­tion & autho­riza­tion with API Gate­way & Cog­ni­to
  • test­ing & run­ning func­tions local­ly
  • CI/CD
  • log aggre­ga­tion
  • mon­i­tor­ing best prac­tices
  • dis­trib­uted trac­ing with X-Ray
  • track­ing cor­re­la­tion IDs
  • per­for­mance & cost opti­miza­tion
  • error han­dling
  • con­fig man­age­ment
  • canary deploy­ment
  • VPC
  • secu­ri­ty
  • lead­ing prac­tices for Lamb­da, Kine­sis, and API Gate­way

You can also get 40% off the face price with the code ytcui. Hur­ry though, this dis­count is only avail­able while we’re in Manning’s Ear­ly Access Pro­gram (MEAP).