AWS Lambda – compare coldstart time with different languages, memory and code sizes

A while back we looked at the performance difference between the language runtimes AWS Lambda supports natively.

AWS Lambda – comparing platform performances

We intentionally omitted coldstart time from that experiment as we were interested in performance differences when a function is “warm”.

However, coldstart is still an important performance consideration, so let’s take a closer look with some experiments designed to measure only coldstart times.

Methodology

From my personal experience running Lambda functions in production, coldstarts happen when a function is idle for ~5 mins. Additionally, functions will be recycled 4 hours after it starts – which was also backed up by analysis by the folks at IO Pipe.

However, the 5 mins rule seems to have changed. After a few tests, I was not able to see coldstart even after a function had been idle for more than 30 mins.

I needed a more reliable way to trigger coldstart.

After a few failed attempts, I settled on a surefire way to cause coldstart : by deploying a new version of my functions before invoking them.

I have a total of 45 functions for both experiments. Using a simple script (see below) I’m able to:

  1. deploy all 45 functions using the Serverless framework
  2. after each round of deployments, invoke the functions programmatically

the deploy + invoke loop takes around 3 mins. I ran the experiment for over 24 hours to collect a meaningful amount of data points. Thankfully the Serverless framework made it easy to create variants of the same function with different memory sizes and to deploy them quickly.

Hypothesis

Here were my hypothesis before the experiments, based on the knowledge that the amount of CPU resource you get is proportional to the amount of memory you allocate to a AWS Lambda function.

  1. C# and Java have higher coldstart time
  2. memory size affects coldstart time linearly
  3. code size affects coldstart time linearly

Let’s see if the experiments support these hypothesis.

Experiment 1 : coldstart time by runtime & memory

For this experiment, I created 20 functions with 5 variants (different memory sizes) for each language runtime – C#, Java, Python and Nodejs.

After running the experiment for a little over 24 hours, I collected a bunch of metric data (which you can download yourself here).

Here is how they look.

Observation #1 : C# and Java have much higher coldstart time

The most obvious trend is that statically typed languages (C# and Java) have over 100 times higher coldstart time. This clearly supports our hypothesis, although to a much greater extent than I anticipated.

Observation #2 : Python has ridiculously low codstart time

I’m pleasantly surprised by how little coldstart the Python runtime experiences. OK, there were some outlier data points that heavily influenced some of the 99 percentile and standard deviations, but you can’t argue with a 0.41ms coldstart time at the 95 percentile of a 128MB function.

Observation #3 : memory size improves coldstart time linearly

The more memory you allocate to your function, the smaller the coldstart time and the less standard deviation in coldstart time too. This is most obvious with the C# and Java runtimes as the baseline (128MB) coldstart time for both are very significant.

Again, the data from this experiment clearly supports our hypothesis.

Experiment 2: coldstart time by code size & memory

For this second experiment, I decided to fix the runtime to Nodejs and create variants with different deployment package size and memory.

Here are the results.

Observation #1 : memory size improves coldstart time linearly

As with the first experiment, the memory size improves the coldstart time (and standard deviation) in a roughly linear fashion.

Observation #2 : code size improves coldstart time

Interestingly the size of the deployment package does not increase the coldstart time (bigger package = more time to download & unzip, or so one might assume). Instead it seems to have a positive effect and decreases the overall coldstart time.

I would love to see someone else repeat the experiment with another language runtime to see if the behaviour is consistent.

Conclusions

The things I learnt from these experiments are:

  • functions are no longer recycled after ~5 mins of idleness, which makes coldstarts far less punishing than before
  • memory size improves coldstart time linearly
  • C# and Java runtimes experience ~100 times the coldstart time of Python and suffer from much higher standard deviation too
  • as a result of the above you should consider running your C#/Java Lambda functions with a higher memory allocation than you would Nodejs/Python functions
  • bigger deployment package size does not increase coldstart time

ps. the source code used for these experiments can be found here, including the scripts used to calculate the stats and generate the plot.ly box charts.

Beware of dilution of DynamoDB throughput due to excessive scaling

TL;DR – The no. of partitions in a DynamoDB table goes up in response to increased load or storage size, but it never come back down, ever.

DynamoDB is pretty great, but as I have seen this particular problem at 3 different companies – Gamesys, JUST EAT, and now Space Ape Games – I think it’s a behaviour that more folks should be aware of.

Credit to AWS, they have regularly talked about the formula for working out the no. of partitions at DynamoDB Deep Dive sessions.

However, they often forget to mention that the DynamoDB will not decrease the no. of partitions when you reduce your throughput units. It’s a crucial detail that is badly under-represented in a lengthy Best Practice guide.

Consider the following scenario:

  • you dial up the throughput for a table because there’s a sudden spike in traffic or you need the extra throughput to run an expensive scan
  • the extra throughputs cause DynamoDB to increase the no. of partitions
  • you dial down the throughput to previous levels, but now you notice that some requests are throttled even when you have not exceeded the provisioned throughput on the table

This happens because there are less read and write throughput units per partition than before due to the increased no. of partitions. It translates to higher likelihood of exceeding read/write throughput on a per-partition basis (even if you’re still under the throughput limits on the table overall).

When this dilution of throughput happens you can:

  1. migrate to a new table
  2. specify higher table-level throughput to boost the through units per partition to previous levels

Given the difficulty of table migrations most folks would opt for option 2, which is how JUST EAT ended up with a table with 3000+ write throughput unit despite consuming closer to 200 write units/s.

In conclusion, you should think very carefully before scaling up a DynamoDB table drastically in response to temporary needs, it can have long lasting cost implications.

Serverless 1.X – enable API Gateway caching on request parameters

Having previously blogged about the untrodden path to enable caching on API Gateway request parameters in the Serverless framework 0.5.X, it’s a little disappointing that it’s still not officially fixed in the 1.X versions…

The Problem

The problem is two-fold:

  1. there’s currently no way to specify caching should be enabled for path & query string parameters
  2. the CloudFormation template Serverless 1.X generates for API Gateway is missing a few optional fields, these missing fields stop you from manually enable caching in the API Gateway management console too

After you deploy your Lambda function with associated API, if you go to the management console and enable caching on path or request parameters you will get an error saying “Invalid cache key parameter specified”.

The Workaround

A friend pointed me to a neat trick to modify the CloudFormation template that Serverless 1.X auto-generates for you.

After the project is deployed, you can go to CloudFormation and view the template that Serverless has generated. These templates are pretty big (and poorly formatted), so I find it easier to open them up in the Designer view and use that view to navigate to the endpoint I’m looking for.

Once you find the resource template for the endpoint, write down its name. Now go back to the serverless.yml file in your project, and add the resource name to the resources section at the bottom. You only need to include fields that you want to update or add to the template.

The CloudFormation syntax for an API Gateway method looks like this:

We also need to fill in some blanks for the Integration section:

For more details on the CloudFormation syntax, see here and here.

After some trial-and-error, the minimum set of fields I had to add are:

Redeploy with Serverless and the path parameter is enabled for caching:

Wrap Up

I hope you have found this post useful, though I’m surprised by the lack of information out there during my research and the lack of official support from the Serverless framework.

You know of a better way to do this, please let me know in the comments.

Links

Yubl’s road to Serverless architecture – Part 4 – building a scalable push notification system

The Road So Far

part 1 : overview

part 2 : testing and continuous delivery strategies

part 3 : ops

 

Just before Yubl’s untimely demise we did an interesting piece of work to redesign the system for sending targeted push notifications to our users to improve retention.

The old system relied on MixPanel for both selecting users as well as sending out the push notifications. Whilst MixPanel was great for getting us basic analytics quickly, we soon found our use cases outgrew MixPanel. The most pressing limitation was that we were not able to query users based on their social graph to create target push notifications – eg. notify an influencer’s followers when he/she publishes a new post or runs a new social media campaign.

Since all of our analytics events are streamed to Google BigQuery (using a combination of Kinesis Firehose, S3 and Lambda) we have all the data we need to support the complex use cases the product team has.

What we needed, was a push notification system that can integrate with BigQuery results and is capable of sending millions of push notifications in a batch.

Design Goals

From a high level, we need to support 2 types of notifications.

Ad-hoc notifications are driven by the marketing team, working closely with influencers and the BI team to match users with influencers or contents that they might be interested in. Example notifications include:

  • users who follow Accessorize and other fashion brands might be interested to know when another notable fashion brand joins the platform
  • users who follow an influencer might be interested to know when the influencer publishes a new post or is running a social media campaign (usually with give-away prizes, etc.)
  • users who have shared/liked music related contents might be interested to know that Tinie Tempah has joined the platform

Scheduled notifications are driven by the product team, these notifications are designed to nudge users to finish the sign up process or to come back to the platform after they have lapsed. Example notifications include:

  • day-1 unfinished sign up : notify users who didn’t finish the sign up process to come back to complete the process
  • day-2 engagement : notify users to come back and follow more people or invite friends on day 2
  • day-21 inactive : notify users who have not logged into the app for 21 days to come back and check out what’s new

A/B testing

For the scheduled notifications, we want to test out different messages/layouts to optimise their effectiveness over time. To do that, we wanted to support A/B testing as part of the new system (which MixPanel already supports).

We should be able to create multiple variants (each with a percentage), along with a control group who will not receive any push notifications.

Oversight vs Frictionless

For the ad-hoc notifications, we don’t want to get in the way of the marketing team doing their job, so the process for creating ad-hoc push notifications should be as frictionless as possible. However, we also don’t want the marketing team to operate completely without oversight and run the risk of long term damage by spamming users with unwanted push notifications (which might cause users to disable notifications or even rage quit the app).

The compromise we reached was an automated approval process whereby:

  1. the marketing team will work with BI on a query to identify users (eg. followers of Tinie Tempah)
  2. fill in a request form, which informs designated approvers via email
  3. approvers can send themselves a test push notification to see how it will be formatted on both Android and iOS
  4. approvers can approve or reject the request
  5. once approved, the request will be executed

Implementation

We decided to use S3 as the source for a send-batch-notifications function because it allows us to pass large list of users (remember, the goal is to support sending push notifications to millions of users in a batch) without having to worry about pagination or limits on payload size.

The function will work with any JSON file in the right format, and that JSON file can be generated in many ways:

  • by the cron jobs that generate scheduled notifications
  • by the approval system after an ad-hoc push notification is approved
  • by the approval system to send a test push notification to the approvers (to visually inspect how the message will appear on both Android and iOS devices)
  • by members of the engineering team when manual interventions are required

We also considered moving to SNS but decided against it in the end because it doesn’t provide useful enough an abstraction to justify the effort to migrate (involves client work) and the additional cost for sending push notifications. Instead, we used node-gcm and apn to communicate with GCM and APN directly.

Recursive Functions FTW

Lambda has a hard limit of 5 mins execution time (it might be softened in the near future), and that might not be enough time to send millions of push notifications.

Our approach to long-running tasks like this is to run the Lambda function as a recursive function.

A naive recursive function would process the payload in fixed size batches and recurse at the end of each batch whilst passing along a token/position to allow the next invocation to continue from where it left off. In this particular case, we have additional considerations because the total number of work items can be very large:

  • minimising the no. of recursions required, which equates to no. of Invoke requests to Lambda and carries a cost implication at scale
  • caching the content of the JSON file to improve performance (by avoiding loading and parsing a large JSON file more than once) and reduce S3 cost

To minimise the no. of recursions, our function would:

  1. process the list of users in small batches of 500
  2. at the end of each batch, call context.getRemainingTimeInMillis() to check how much time is left in this invocation
  3. if there is more than 1 min left in the invocation then process another batch; otherwise recurse

When caching the content of the JSON file from S3, we also need to compare the ETAG to ensure that the content of the file hasn’t changed.

With this set up the system was able to easily handle JSON files with more than 1 million users during our load test (sorry Apple and Google for sending all those fake device tokens :-P).

Auto-scaling Kinesis streams with AWS Lambda

Following on from the last post where we discussed 3 useful tips for working effectively with Lambda and Kinesis, let’s look at how you can use Lambda to help you auto scale Kinesis streams.

Auto-scaling for DynamoDB and Kinesis are two of the most frequently requested features for AWS, as I write this post I’m sure the folks at AWS are working hard to make them happen. Until then, here’s how you can roll a cost effective solution yourself.

From a high level, we want to:

  • scale up Kinesis streams quickly to meet increases in load
  • scale down under-utilised Kinesis streams to save cost

Scaling Up

Reaction time is important for scaling up, and from personal experience I find polling CloudWatch metrics to be a poor solution because:

  • CloudWatch metrics are usually over a minute behind
  • depending on polling frequency, your reaction time is even further behind
  • high polling frequency has a small cost impact

sidebar: I briefly experimented with Kinesis scaling utility from AWS Labs before deciding to implement our own solution. I found that it doesn’t scale up fast enough because it uses this polling approach, and I had experienced similar issues around reaction time with dynamic-dynamodb too.


Instead, prefer a push-based approach using CloudWatch Alarms.

Whilst CloudWatch Alarms is not available as trigger to Lambda functions, you can use SNS as a proxy:

  1. add a SNS topic as notification target for CloudWatch Alarm
  2. add the SNS topic as trigger to a Lambda function to scale up the stream that has tripped the alarm

WHAT metrics?

You can use a number of metrics for triggering the scaling action, here are a few to consider.

WriteProvisionedThroughputExceeded (stream)

The simplest way is to scale up as soon as you’re throttled. With a stream-level metric you only need to set up the alarm once per stream and wouldn’t need to adjust the threshold value after each scaling action.

However, since you’re reusing the same CloudWatch Alarm you must remember to set its status to OK after scaling up.

IncomingBytes and/or IncomingRecords (stream)

You can scale up preemtively (before you’re actually throttled by the service) by calculating the provisioned throughput and then setting the alarm threshold to be, say 80% of the provisioned throughput. After all, this is exactly what we’d do for scaling EC2 clusters and the same principle applies here – why wait till you’re impacted by load when you can scale up just ahead of time?

However, we need to manage some additional complexities EC2 auto scaling service usually takes care of for us:

  • if we alarm on both IncomingBytes and IncomingRecords then it’s possible to overscale (impacts cost) if both triggers around the same time; this can be mitigated but it’s down to us to ensure only one scaling action can occur at once and that there’s a cooldown after each scaling activity
  • after each scaling activity, we need to recalculate the provisioned throughput and update the alarm threshold(s)

WriteProvisionedThroughputExceeded (shard)

IncomingBytes and/or IncomingRecords (shard)

With shard level metrics you get the benefit of knowing the shard ID (in the SNS message) so you can be more precise when scaling up by splitting specific shard(s). The downside is that you have to add or remove CloudWatch Alarms after each scaling action.

HOW to scale up

To actually scale up a Kinesis stream, you’ll need to increase the no. of active shards by splitting one of more of the existing shards. One thing to keep in mind is that once a shard is split into 2, it’s no longer ACTIVE but it will still be accessible for up to 7 days (depending on your retention policy setting) and you’ll still pay for it the whole time!

Broadly speaking, you have two options available to you:

  1. use UpdateShardCount and let Kinesis figure out how to do it
  2. choose one or more shards and split them yourself using SplitShard

Option 1 is far simpler but comes with some heavy baggage:

  • because it only supports UNIFORM_SCALING (at the time of writing) it means this action can result in many temporary shards being created unless you double up each time (remember, you’ll pay for all those temporary shards for up to 7 days)
  • doubling up can be really expensive at scale (and possibly unnecessary depending on load pattern)
  • plus all the other limitations

As for Option 2, if you’re using shard level metrics then you can split only the shards that have triggered the alarm(s). Otherwise, a simple strategy would be to sort the shards by their hash range and split the biggest shards first.

Scaling Down

To scale down a Kinesis stream you merge two adjacent shards. Just as splitting a shard leaves behind an inactive shard that you’ll still pay for, merging shards will leave behind two inactive shards!

Since scaling down is primarily a cost saving exercise, I strongly recommend that you don’t scale down too often as you could easily end up increasing your cost instead if you have to scale up soon after scaling down (hence leaving behind lots inactive shards).

Since we want to scale down infrequently, it makes more sense to do so with a cron job (ie. CloudWatch Event + Lmabda) than to use CloudWatch Alarms. As an example, after some trial and error we settled on scaling down once every 36 hours, which is 1.5x our retention policy of 24 hours.

WHICH stream

When the cron job runs, our Lambda function would iterate through all the Kinesis streams and for each stream:

  • calculate its provisioned throughput in terms of both bytes/s and records/s
  • get 5 min metrics (IncomingBytes and IncomingRecords) over the last 24 hours
  • if all the data points over the last 24 hours are below 50% of the provisioned throughput then scale down the stream

The reason we went with 5 min metrics is because that’s the granularity the Kinesis dashboard uses and allows me to validate my calculations (you don’t get bytes/s and records/s values from CloudWatch directly, but will need to calculate them yourself).

Also, we require all datapoints over the last 24 hours to be below the 50% threshold to be absolutely sure that utilization level is consistently below the threshold rather than a temporary blip (which could be a result of an outage for example).

HOW to scale down

We have the same trade-offs between using UpdateShardCount and doing-it-yourself with MergeShards as scaling up.

Wrapping Up

To set up the initial CloudWatch Alarms for a stream, we have a repo which hosts the configurations for all of our Kinsis streams, as well as a script for creating any missing streams and associated CloudWatch Alarms (using CloudFormation templates).

Additionally, as you can see from the screenshot above, the configuration file also specifies the min and max no. of shards for each Kinesis stream. When the create-streams script creates a new stream, it’ll be created with the specified desiredShards no. of shards.

 

Hope you enjoyed this post, please let me know in the comments below if you are doing something similar to auto-scale your Kinesis streams and if you have any experience you’d like to share.

 

Links