AWS Lambda – monolithic functions won’t help you with cold starts

After my post on monolithic functions vs single-purposed functions, a few people asked me about the effect monolithic functions have on cold starts, so I thought I’d share my thoughts here.

The question goes something like this:

Monolithic functions are invoked more frequently so they are less likely to be in cold state, while single-purposed functions that are not being used frequently may always be cold state, don’t you think?

That seems like a fair assumption, but the actual behaviour of cold starts is a more nuanced discussion and can have drastically different results depending on traffic pattern. Check out my other post that goes into this behaviour in more detail.

The effect of consolidation into monolithic functions (on the no. of cold starts experienced) quickly diminishes with load

To simplify things, let’s consider “the number of cold starts you’ll have experienced as you ramp up to X req/s”. Assuming that:

  • the ramp up was gradual so there was no massive spikes (which could trigger a lot more cold starts)
  • each request’s duration is short, say, 100ms

At a small scale, say, 1 req/s per endpoint, and a total of 10 endpoints (which is 1 monolithic function vs 10 single purposed functions) we’ll have a total of 10 req/s. Given the 100ms execution time, it’s just within what one concurrent function is able to handle.

To reach 1 req/s per endpoint, you will have experienced:

  • monolithic: 1 cold start
  • single-purposed: 10 cold starts

As the load goes up, to 100 req/s per endpoint, which equates to a total of 1000 req/s. To handle this load you’ll need at least 100 concurrent executions of the monolithic function (100ms per req, so the throughput per concurrent execution is 10 req/s, hence concurrent executions = 1000 / 10 = 100). To reach this level of concurrency, you will have experienced:

  • monolithic: 100 cold starts

At this point, 100 req/s per endpoint = 10 concurrent executions for each of the single-purposed functions. To reach that level of concurrency, you will also have experienced:

  • single-purposed: 10 concurrent execs * 10 functions = 100 cold starts

So, monolithic functions don’t help you with the no. of cold starts you’ll experience even at a moderate amount of load.

Also, when the load is low, there are simple things you can do to mitigate cold starts by pre-warming your functions (as discussed in the other post). You can even use the serverless-plugin-warmup to do that for you, and it even comes with the option to do a pre-warmup run after a deployment.

However, this practice stops being effective when you have even a moderate amount of concurrency. At which point, monolithic functions would incur just as many cold starts as single-purposed functions.

Consolidating into monolithic functions can increase initialization time, which increases the duration of cold start

By packing more “actions” into one function, we also increase the no. of modules that need to be initialized during the cold start of that function, and are therefore highly to experience longer cold starts as a result (basically, anything outside of the exported handler function is initialized during the Bootstrap runtime phase (see below) of the cold start.

from Ajay Nair’s talk at re:invent 2017?—?https://www.youtube.com/watch?v=oQFORsso2go

Imagine in the monolithic version of the fictional user-api I used to illustrate the point in this post, our handler module would need to require all the dependencies used by all the endpoints.

const depA = require('lodash');
const depB = require('facebook-node-sdk');
const depC = require('aws-sdk');
...

Whereas in the single-purposed version of the user-api, only the get-user-by-facebook-id endpoint’s handler function would need to incur the extra overhead of initializing the facebook-node-sdk dependency during cold start.

You also have to factor in any other modules in the same project, and their dependencies, and any code that will be run during those modules’ initialization, and so on.

Wrong place to optimize cold start

So, contrary to one’s intuition, monolithic functions don’t offer any benefit towards cold starts outside what basic prewarming can achieve already, and can quite likely extend the duration of cold starts.

Since cold start affects you wildly differently depending on language, memory and how much initialization you’re doing in your code. I’ll argue that, if cold starts is a concern for you, then you’re far better off switching to another language (i.e. Go, Node.js or Python) and to invest effort into optimizing your code so it suffers shorter cold starts.

Also, keep in mind that this is something that AWS and other providers are actively working on and I suspect the situation will be vastly improved in the future by the platform.

All and all, I think changing the deployment units (one big function vs many small functions) is not the right way to address cold starts.

I’m afraid you’re thinking about AWS Lambda cold starts all wrong

When I discuss AWS Lambda cold starts with folks in the context of API Gateway, I often get responses along the line of:

Meh, it’s only the first request right? So what if one request is slow, the next million requests would be fast.

Unfortunately that is an oversimplification of what happens.

Cold start happens once for each concurrent execution of your function.

API Gateway would reuse concurrent executions of your function that are already running if possible, and based on my observations, might even queue up requests for a short time in hope that one of the concurrent executions would finish and can be reused.

If, all the user requests to an API happen one after another, then sure, you will only experience one cold start in the process. You can simulate what happens with Charles proxy by capturing a request to an API Gateway endpoint and repeat it with a concurrency setting of 1.

As you can see in the timeline below, only the first request experienced a cold start and was therefore noticeably slower compared to the rest.

1 out of 100, that’s bearable, hell, it won’t even show up in my 99 percentile latency metric.

What if the user requests came in droves instead? After all, user behaviours are unpredictable and unlikely to follow the nice sequential pattern we see above. So let’s simulate what happens when we receive 100 requests with a concurrency of 10.

All of a sudden, things don’t look quite as rosy – the first 10 requests were all cold starts! This could spell trouble if your traffic pattern is highly bursty around specific times of the day or specific events, e.g.

  • food ordering services (e.g. JustEat, Deliveroo) have bursts of traffic around meal times
  • e-commence sites have highly concentrated bursts of traffic around popular shopping days of the year – cyber monday, black friday, etc.
  • betting services have bursts of traffic around sporting events
  • social networks have bursts of traffic around, well, just about any notable events happening around the world

For all of these services, the sudden bursts of traffic means API Gateway would have to add more concurrent executions of your Lambda function, and that equates to a bursts of cold starts, and that’s bad news for you. These are the most crucial periods for your business, and precisely when you want your service to be at its best behaviour.

If the spikes are predictable, as is the case for food ordering services, you can mitigate the effect of cold starts by pre-warming your APIs – i.e. if you know there will be a burst of traffic at noon, then you can schedule a cron job (aka, CloudWatch schedule + Lambda) for 11:58am that will hit the relevant APIs with a blast of concurrent requests, enough to cause API Gateway to spawn sufficient no. of concurrent executions of your function(s).

You can mark these requests with a special HTTP header or payload, so that the handling function can distinguish it from a normal user requests and can short-circuit without attempting to execute the normal code path.

That’s great that you mitigates the impact of cold starts during these predictable bursts of traffic, but does it not betray the ethos of serverless compute, that you shouldn’t have to worry about scaling?

Sure, but making users happy thumps everything else, and users are not happy if they have to wait for your function to cold start before they can order their food, and the cost of switching to a competitor is low so they might not even come back the next day.

Alternatively, you could consider reducing the impact of cold starts by reducing the length of cold starts:

  • by authoring your Lambda functions in a language that doesn’t incur a high cold start time – i.e. Node.js, Python, or Go
  • choose a higher memory setting for functions on the critical path of handling user requests (i.e. anything that the user would have to wait for a response from, including intermediate APIs)
  • optimizing your function’s dependencies, and package size
  • stay as far away from VPCs as you possibly can! VPC access requires Lambda to create ENIs (elastic network interface) to the target VPC and that easily adds 10s (yeah, you’re reading it right) to your cold start

There are also two other factors to consider:

Finally, what about those APIs that are seldom used? It’s actually quite likely that every time someone hits that API they will incur a cold start, so to your users, that API is always slow and they become even less likely to use it in the future, which is a vicious cycle.

For these APIs, you can have a cron job that runs every 5–10 mins and pings the API (with a special ping request), so that by the time the API is used by a real user it’ll hopefully be warm and the user would not be the one to have to endure the cold start time.

This method for pinging API endpoints to keep them warm is less effective for “busy” functions with lots of concurrent executions – your ping message would only reach one of the concurrent executions and if the level of concurrent user requests drop then some of the concurrent executions would be garbage collected after some period of idleness, and that is what you want (don’t pay for resources you don’t need).

Anyhow, this post is not intended to be your one-stop guide to AWS Lambda cold starts, but merely to illustrate that it’s a more nuanced discussion than “just the first request”.

As much as cold starts is a characteristic of the platform that we just have to accept, and we love the AWS Lambda platform and want to use it as it delivers on so many fronts. It’s important not to let our own preference blind us from what’s important, which is to keep our users happy and build a product that they would want to keep on using.

To do that, you do need to know the platform you’re building on top of, and with the cost of experimentation so low, there’s no good reason not to experiment with AWS Lambda yourself and try to learn more about how it behaves and how you can make the most of it.

AWS Lambda – use the invocation context to better handle slow HTTP responses

With API Gateway and Lambda, you’re forced to use relatively short timeouts on the server-side:

  • API Gateway have a 30s max timeout on all integration points
  • Serverless framework uses a default of 6s for AWS Lambda functions

However, as you have limited influence over a Lambda function’s cold start time and have no control over the amount of latency overhead API Gateway introduces, the actual client-facing latency you’d experience from a calling function is far less predictable.

 
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/api-gateway-metrics-dimensions.html

To prevent slow HTTP responses from causing the calling function to timeout (and therefore impact the user experience we offer) we should make sure we stop waiting for a response before the calling function times out.

“the goal of the timeout strategy is to give HTTP requests the best chance to succeed, provided that doing so does not cause the calling function itself to err”

– me

Most of the time, I see folks use fixed (either hard coded or specified via config) timeout values, which is often tricky to decide:

  • too short and you won’t give the request the best chance to succeed, e.g. there’s 5s left in the invocation but we had set timeout to 3s
  • too long and you run the risk of letting the request timeout the calling function, e.g. there’s 5s left in the invocation but we had set timeout to 6s

This challenge of choosing the right timeout value is further complicated by the fact that we often perform more than one HTTP request during a function invocation – e.g. read from DynamoDB, talk to some internal API, then save changes to DynamoDB equals a total of 3 HTTP requests in one invocation.

Let’s look at two common approaches for picking timeout values and scenarios where they fall short of meeting our goal.

requests are not given the best chance to succeed

requests are allowed too much time to execute and caused the function to timeout.

Instead, we should set the request timeout based on the amount of invocation time left, whilst taking into account the time required to perform any recovery steps – e.g. return a meaningful error with application specific error code in the response body, or return a fallback result instead.

You can easily find out how much time is left in the current invocation through the context object your function is invoked with.

https://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html

For example, if a function’s timeout is 6s, but by the time you make the HTTP request you’re already 1s into the invocation (perhaps you had to do some expensive computation first), and if we reserve 500ms for recovery, then that leaves us with 4.5s to wait for HTTP response.

With this approach, we get the best of both worlds:

  • allow requests the best chance to succeed based on the actual amount of invocation time we have left; and
  • prevent slow responses from timing out the function, which allows us a window of opportunity to perform recovery actions

    requests are given the best chance to succeed, without being restricted by an arbitrarily determined timeout.

slow responses are timed out before they cause the calling function to time out

But what are you going to do AFTER you time out these requests? Aren’t you still going to have to respond with a HTTP error since you couldn’t finish whatever operations you needed to perform?

At the minimum, the recovery actions should include:

  • log the timeout incident with as much context as possible (e.g. how much time the request had), including all the relevant correlation IDs
  • track custom metrics for serviceX.timedout so it can be monitored and the team can be alerted if the situation escalates
  • return an application error code in the response body (see example below), along with the request ID so the user-facing client app can display a user friendly message like “Oops, looks like this feature is currently unavailable, please try again later. If this is urgent, please contact us at xxx@domain.com and quote the request ID f19a7dca. Thank you for your cooperation :-)”
{
  "errorCode": 10021,
  "requestId": "f19a7dca",
  "message": "service X timed out"
}

In some cases, you can also recover even more gracefully using fallbacks.

Netflix’s Hystrix library, for instance, supports several flavours of fallbacks via the Command pattern it employs so heavily. In fact, if you haven’t read its wiki page already then I strongly recommend that you go and give it a thorough read, there are tons of useful information and ideas there.

At the very least, every command lets you specify a fallback action.

You can also chain the fallback together by chaining commands via their respective getFallback methods.

For example,

  1. execute a DynamoDB read inside CommandA
  2. in the getFallback method, execute CommandB which would return a previously cached response if available
  3. if there is no cached response then CommandB would fail, and trigger its own getFallback method
  4. execute CommandC, which returns a stubbed response

Anyway, check out Hystrix if you haven’t already, most of the patterns that are baked into Hystrix can be easily adopted in our serverless applications to help make them more resilient to failures – something that I’m actively exploring with a separate series on applying principles of chaos engineering to Serverless.

Applying principles of chaos engineering to AWS Lambda with latency injection

This is part 2 of a multipart series that explores ideas on how we could apply the principles of chaos engineering to serverless architectures built around Lambda functions.


The most common issue I have encountered in production are latency/performance related. They can be symptomatic of a host of underlying causes ranging from AWS network issues (which can also manifest itself in latency/error-rate spikes in any of the AWS services), overloaded servers, to simple GC pauses.

Latency spikes are inevitable – as much as you can improve the performance of your application, things will go wrong, eventually, and often they’re out of your control.

So you must design for them, and degrade the quality of your application gracefully to minimize the impact on your users.

In the case of API Gateway and Lambda, there are additional considerations:

  • API Gateway has a hard limit of 29s timeout for integration points, so even if your Lambda function can run for up to 5 mins, API Gateway will timeout way before that

If you use moderate timeout settings for your API functions (and you should!) then you need to consider the effects of cold starts when calling an intermediate service.

Where to inject latency

Suppose our client application communicates directly with 2 public facing APIs, whom in turn depends on an internal API.

In this setup, I can think of 3 places where we can inject latency and each would validate a different hypothesis.

Inject latency at HTTP clients

The first, and easiest place to inject latency is in the HTTP client library we use to communicate with the internal API.

This will test that our function has appropriate timeout on this HTTP communication and can degrade gracefully when this request time out.

We can inject latency to the HTTP client libraries for our internal APIs, hence validating that the caller function has configured appropriate timeout and error handling for timeouts.

Furthermore, this practice should also be applied to other 3rd party services we depend on, such as DynamoDB. We will discuss how we can inject latency to these 3rd party libraries later in the post.

We can also inject latency to 3rd party client libraries for other managed services we depend on.

This is a reasonably safe place to inject latency as the immediate blast radius is limited to this function.

However, you can (and arguably, should) consider applying this type of latency injection to intermediate services as well. Doing so does carry extra risk as it has a broader blast radius in failures case – ie. if the function under test does not degrade gracefully then it can cause unintended problems to outer services. In this case, the blast radius for these failure cases is the same as if you’re injecting latency to the intermediate functions directly.

Inject latency to intermediate functions

You can also inject latency directly to the functions themselves (we’ll look at how later on). This has the same effect as injecting latency to the HTTP client to each of its dependents, except it’ll affect all its dependents at once.

We can inject latency to a function’s invocation. If that function is behind an internal API that are used by multiple public-facing APIs then it can cause all its dependents to experience timeouts.

This might seem risky (it can be), but is an effective way to validate that every service that depends on this API endpoint is expecting, and handling timeouts gracefully.

It makes most sense when applied to intermediate APIs that are part of a bounded context (or, a microservice), maintained by the same team of developers. That way, you avoid unleashing chaos upon unsuspecting developers who might not be ready to deal with the chaos.

That said, I think there is a good counter-argument for doing just that.

We often fall into the pitfall of using the performance characteristics of dev environments as predictor for the production environment. Whilst we seldom experience load-related latency problems in the dev environments?—?because we don’t have enough load in those environments to begin with?—?production is quite another story. Which means, we’re not programmed to think about these failure modes during development.

So, a good way to hack the brains of your fellow developers and to programme them to expect timeouts is to expose them to these failure modes regularly in the dev environments, by injecting latency to our internal APIs in those environments.

We can figuratively hold up a sign and tell other developers to expect latency spikes and timeouts by literally exposing them to these scenarios in dev environments, regularly, so they know to expect it.

In fact, if we make our dev environments exhibit the most hostile and turbulent conditions that our systems should be expected to handle, then we know for sure that any system that makes its way to production are ready to face what awaits it in the wild.

Inject latency to public-facing functions

So far, we have focused on validating the handling of latency spikes and timeouts in our APIs. The same validation is needed for our client application.

We can apply all the same arguments mentioned above here. By injecting latency to our public-facing API functions (in both production as well as dev environments), we can:

  • validate the client application handles latency spikes and timeouts gracefully, and offers the best UX as possible in these situations
  • train our client developers to expect latency spikes and timeouts

When I was working on a MMORPG at Gamesys years ago, we uncovered a host of frailties in the game when we injected latency spikes and faults to our APIs. The game would crash during startup if any of the first handful of requests fails. In some cases, if the response time was longer than a few seconds then the game would also get into a weird state because of race conditions.

Turns out I was setting my colleagues up for failure in production because the dev environment was so forgiving and gave them a false sense of comfort.

With that, let’s talk about how we can apply the practice of latency injection.

But wait, can’t you inject latency in the client-side HTTP clients too?

Absolutely! And you should! However, for the purpose of this post we are going to look at how and where we can inject latency to our Lambda functions only, hence why I have willfully ignored this part of the equation.

How to inject latency

There are 2 aspects to actually injecting latencies:

  1. adding delays to operations
  2. configuring how often and how much delay to add

If you read my previous posts on capturing and forwarding correlation IDsand managing configurations with SSM Parameter Store, then you have already seen the basic building blocks we need to do both.

How to inject latency to HTTP client

Since you are unlikely to write a HTTP client from scratch, so I consider the problem for injecting latency to HTTP client and 3rd party clients (such as the AWS SDK) to be one and the same.

A couple of solutions jump to mind:

  • in static languages, you can consider using a static weaver such as AspectJ or PostSharp, this is the approach I took previously
  • in static languages, you can consider using dynamic proxies, which many IoC frameworks offer (another form of AOP)
  • you can create a wrapper for the client, either manually or with a factory function (bluebirdjs’s promisifyAll function is a good example)

Since I’m going to use Node.js as example, I’m going to focus on wrappers.

For the HTTP client, given the relatively small number of methods you will need, it’s feasible to craft the wrapper by hand, especially if you have a particular API design in mind.

Using the HTTP client I created for the correlation ID post as base, I modified it to accept a configuration object to control the latency injection behaviour.

{
  "isEnabled": true,
  "probability": 0.5,
  "minDelay": 100,
  "maxDelay": 5000
}

You can find this modified HTTP client here, below is a simplified version of this client (which uses superagent under the hood).

To configure the function and the latency injection behaviour, we can use the configClient I first created in the SSM Parameter Store post.

First, let’s create the configs in the SSM Parameter Store.

You can create and optionally encrypt parameter values in the SSM Parameter Store.

The configs contains the URL for the internal API, as well as a chaosConfigobject. For now, we just have a httpClientLatencyInjectionConfig property, which is used to control the HTTP client’s latency injection behaviour.

{ 
  "internalApi": "https://xx.amazonaws.com/dev/internal", 
  "chaosConfig": {
    "httpClientLatencyInjectionConfig": {
      "isEnabled": true,
      "probability": 0.5,
      "minDelay": 100,
      "maxDelay": 5000
    }
  } 
}

Using the aforementioned configClient, we can fetch the JSON config from SSM Parameter Store at runtime.

const configKey = "public-api-a.config";
const configObj = configClient.loadConfigs([ configKey ]);

let config = JSON.parse(yield configObj["public-api-a.config"]);
let internalApiUrl = config.internalApi;
let chaosConfig = config.chaosConfig || {};
let injectionConfig = chaosConfig.httpClientLatencyInjectionConfig;

let reply = yield http({ 
  method : 'GET', 
  uri : internalApiUrl, 
  latencyInjectionConfig: injectionConfig 
});

The above configuration gives us a 50% chance of injecting a latency between 100ms and 3sec when we make the HTTP request to internal-api.

This is reflected in the following X-Ray traces.

How to inject latency to AWSSDK

With the AWS SDK, it’s not feasible to craft the wrapper by hand. Instead, we could do with a factory function like bluebird’s promisifyAll.

We can apply the same approach here, and I made a crude attempt at doing just that. I must add that, whilst I consider myself a competent Node.js programmer, I’m sure there’s a better way to implement this factory function.

My factory function will only work with promisified objects (told you it’s crude..), and replaces their xxxAsync functions with a wrapper that takes in one more argument of the shape:

{
  "isEnabled": true,
  "probability": 0.5,
  "minDelay": 100,
  "maxDelay": 3000
}

Again, it’s clumsy, but we can take the DocumentClient from the AWS SDK, promisify it with bluebird, then wrap the promisified object with our own wrapper factory. Then, we can call its async functions with an optional argument to control the latency injection behaviour.

You can see this in action in the handler function for public-api-b .

For some reason, the wrapped function is not able to record subsegments in X-Ray. I suspect it’s some nuance about Javascript or the X-Ray SDK that I do not fully understand.

Nonetheless, judging from the logs, I can confirm that the wrapped function does indeed inject latency to the getAsync call to DynamoDB.

If you know of a way to improve the factory function, or to get the X-Ray tracing work with the wrapped function, please let me know in the comments.

How to inject latency to function invocations

The apiHandler factory function I created in the correlation ID post is a good place to apply common implementation patterns that we want from our API functions, including:

  • log the event source as debug
  • log the response and/or error from the invocation (which, surprisingly, Lambda doesn’t capture by default)
  • initialize global context (eg. for tracking correlation IDs)
  • handle serialization for the response object
  • etc..
// this is how you use the apiHandler factory function to create a
// handler function for API Gateway event source
module.exports.handler = apiHandler(
  co.wrap(function* (event, context) {
    ... // do bunch of stuff
    // instead of invoking the callback directly, you return the
    // response you want to send, and the wrapped handler function
    // would handle the serialization and invoking callback for you
    // also, it takes care of other things for you, like logging
    // the event source, and logging unhandled exceptions, etc.
   return { message : "everything is awesome" };
  })
);

In this case, it’s also a good place for us to inject latency to the API function.

However, to do that, we need to access the configuration for the function. Time to lift the responsibility for fetching configurations into the apiHandlerfactory then!

The full apiHandler factory function can be found here, below is a simplified version that illustrates the point.

Now, we can write our API function like the following.

Now that the apiHandler has access to the config for the function, it can access the chaosConfig object too.

Let’s extend the definition for the chaosConfig object to add a functionLatencyInjectionConfig property.

"chaosConfig": {
  "functionLatencyInjectionConfig": {
    "isEnabled": true,
    "probability": 0.5,
    "minDelay": 100,
    "maxDelay": 5000
  },
  "httpClientLatencyInjectionConfig": {
    "isEnabled": true,
    "probability": 0.5,
    "minDelay": 100,
    "maxDelay": 5000
  }
}

With this additional configuration, we can modify the apiHandler factory function to use it to inject latency to a function’s invocation much like what we did in the HTTP client.

Just like that, we can now inject latency to function invocations via configuration. This will work for any API function that is created using the apiHandler factory.

With this change and both kinds of latency injections enabled, I can observe all the expected scenarios through X-Ray:

  • no latency was injected

  • latency was injected to the function invocation only

  • latency was injected to the HTTP client only

  • latency was injected to both HTTP client and the function invocation, but the invocation did not timeout as a result

  • latency was injected to both HTTP client and the function invocation, and the invocation times out as a result

I can get further confirmation of the expected behaviour through logs, and the metadata recorded in the X-Ray traces.

Recap, and future works

In this post we discussed:

  • why you should consider applying the practice of latency injection to APIs created with API Gateway and Lambda
  • additional considerations specific to API Gateway and Lambda
  • where you can inject latencies, and why you should consider injecting latency at each of these places
  • how you can inject latency in HTTP clients, AWS SDK, as well as the function invocation

The approach we have discussed here is driven by configuration, and the configuration is refreshed every 3 mins by default.

We can go much further with this.

Fine grained configuration

The configurations can be more fine grained, and allow you to control latency injection to specific resources.

For example, instead of a blanket httpClientLatencyInjectionConfig for all HTTP requests (including those requests to AWS services), the configuration can be specific to an API, or a DynamoDB table.

Automation

The configurations can be changed by an automated process to:

  • run routine validations daily
  • stop all latency injections during off hours, and holidays
  • forcefully stop all latency injections, eg. during an actual outage
  • orchestrate complex scenarios that are difficult to manage by hand, eg. enable latency injection at several places at once

Again, we can look to Netflix for inspiration for such an automated platform.

Usually, you would want to enable one latency injection in a bounded context at a time. This helps contain the blast radius of unintended damages, and make sure your experiments are actually controlled. Also, when latency is injected at several places, it is harder to understand the causality we observe as there are multiple variables to consider.

Unless, of course, you’re validating against specific hypothesis such as:

The system can tolerate outage to both the primary store (DynamoDB) as well as the backup store (S3) for user preferences, and would return a hardcoded default value in that case.

Better communication

Another good thing to do, is to inform the caller of the fact that latency has been added to the invocation by design.

This might take the form of a HTTP header in the response to tell the caller how much latency was injected in total. If you’re using an automated process to generate these experiments, then you should also include the id/tag/name for the specific instance of the experiment as HTTP header as well.

What’s next?

As I mentioned in the previous post, you need to apply common sense when deciding when and where you apply chaos engineering practices.

Don’t attempt an exercises that you know is beyond your abilities.

Before you even consider applying latency injection to your APIs in production, you need to think about how you can deal with these latency spikes given the inherent constraints of API Gateway and Lambda.

Unfortunately, we have run out of time to talk about this in this post, but come back in 2 weeks and we will talk about several strategies you can employ in part 3.

The code for the demo in this post is available on github here. Feel free to play around with it and let me know if you have any suggestions for improvement!

References

Using Protocol Buffers with API Gateway and AWS Lambda

AWS announced binary support for API Gateway in late 2016, which opened up the door for you to use more efficient binary formats such as Google’s Protocol Buffers and Apache Thrift.

Why?

Compared to JSON – which is the bread and butter for APIs built with API Gateway and Lambda – these binary formats can produce significantly smaller payloads.

At scale, they can make a big difference to your bandwidth cost.

In restricted environments such as low-end devices or in countries with poor mobile connections, sending smaller payloads can also improve your user experience by improving the end-to-end network latency, and possibly processing time on the device too.

Comparison of serializer performance between Proto Buffers and JSON in .Net

How

Follow these 3 simple steps (assuming you’re using Serverless framework):

  1. install the awesome serverless-apigw-binary plugin
  2. add application/x-protobuf to binary media types (see screenshot below)
  3. add function that returns Protocol Buffers as base64 encoded response

The serverless-apigw-binary plugin has made it really easy to add binary support to API Gateway

To encode & decode Protocol Buffers payload in Nodejs, you can use the protobufjs package from NPM.

It lets you work with your existing .proto files, or you can use JSON descriptors. Give the docs a read to see how you can get started.

In the demo project (link at the bottom of the post) you’ll find a Lambda function that always returns a response in Protocol Buffers.

Couple of things to note from this function:

  • we set the Content-Type header to application/x-protobuf
  • body is base64 encoded representation of the Protocol Buffers payload
  • isBase64Encoded is set to true

you need to do all 3 of these things to make API Gateway return the response as binary data.

Consider them the magic incantation for making API Gateway return binary data, and, the caller also has to set the Accept header to application/x-protobuf.

In the same project, there’s also a JSON endpoint that returns the same payload as comparison.

The response from this JSON endpoint looks like this:

{"players":[{"id":"eb66db14992e06b36282d607cf0134ce4fe45f50","name":"Calvin Ortiz","scores":[57,12,100,56,47,78,20,37,32,48]},{"id":"7b9b38e535453d120e706ff57fef41f6fee991cb","name":"Marcus Cummings","scores":[40,57,24,15,45,54,25,67,59,23]},{"id":"db34a2a5f4d16e77a6d3d6154a8b8bb6760b3b99","name":"Harry James","scores":[61,85,14,70,8,80,14,22,76,87]},{"id":"e21018c4f43eef10771e0fa71bc54156b00a64dd","name":"Gregory Bishop","scores":[51,31,27,47,72,75,61,28,100,41]},{"id":"b3ee29ee49b640ce15be1737d0dca60e48108ee1","name":"Ann Evans","scores":[69,17,48,99,85,8,75,55,78,46]},{"id":"9c1e6d4d46bb0c0d2c92bab11e5dbd5f4ab0c619","name":"Juan Perez","scores":[71,34,60,84,21,98,60,8,91,92]},{"id":"d8de89222633c61393931457c1e72558eba48639","name":"Loretta Harvey","scores":[15,40,73,92,42,65,58,30,26,84]},{"id":"141dad672ec559431f808964391d128d2c3274bf","name":"Ian Powell","scores":[17,21,14,84,64,14,22,22,34,92]},{"id":"8a97e85e2e5385c45fc31f24bfe781c26f78c0b7","name":"Steve Gibson","scores":[33,97,6,1,20,1,78,3,77,19]},{"id":"6b3ca6924e17cd5fd9d91b36d49b36a5d542c9ea","name":"Harold Ferguson","scores":[31,32,4,10,37,85,46,86,39,17]}]}

As you can see, it’s just a bunch of randomly generated names and GUIDs, and integers. The same response in Protocol Buffers is nearly 40% smaller.

Problem with the protobufjs package

Before we move on, there is one important detail about using the protobufjspacakge in a Lambda function – you need to npm install the package on a Linux system.

This is because it has a dependency that is distributed as native binaries, so if you installed the packaged on OSX then the binaries that are packaged and deployed to Lambda will not run on the Lambda execution environment.

I had similar problems with other Google libraries in the past. I find the best way to deal with this is to take a leaf out of aws-serverless-go-shim’s approach and deploy your code inside a Docker container.

This way, you would locally install a compatible version of the native binaries for your OS so you can continue to run and debug your function with sls invoke local (see this post for details).

But, during deployment, a script would run npm install --force in a Docker container running a compatible Linux distribution. This would then install a version of the native binaries that can be executed in the Lambda execution environment. The script would then use sls deploy to deploy the function.

The deployment script can be something simple like this:

In the demo project, I also have a docker-compose.yml file:

The Serverless framework requires my AWS credentials, hence why I’ve attached the $HOME/.aws directory to the container for the AWSSDK to find at runtime.

To deploy, run docker-compose up.

Use HTTP content negotiation

Whilst binary formats are more efficient when it comes to payload size, they do have one major problem: they’re really hard to debug.

Imagine the scenario – you have observed a bug, but you’re not sure if the problem is in the client app or the server. But hey, let’s just observe the HTTP conversation with a HTTP proxy such as Charles or Fiddler.

This workflow works great for JSON but breaks down when it comes to binary formats such as Protocol Buffers as the payloads are not human readable.

As we have discussed in this post, the human readability of JSON comes with the cost of heavier bandwidth usage. For most network communications, be it service-to-service, or service-to-client, unless a human is actively “reading” the payloads it’s not worth paying the cost. But when a human is trying to read it, that human readability is very valuable.

Fortunately, HTTP’s content negotiation mechanism means we can have the best of both worlds.

In the demo project, there is a contentNegotiated function which returns either JSON or Protocol Buffers payloads based on what the Accept header.

By default, you should use Protocol Buffers for all your network communications to minimise bandwidth use.

But, you should build in a mechanism for toggling the communication to JSON when you need to observe the communications. This might mean:

  • for debug builds of your mobile app, allow super users (devs, QA, etc.) the ability to turn on debug mode, which would switch the networking layer to send Accept header as application/json
  • for services, include a configuration option to turn on debug mode (see this post on configuring functions with SSM parameters and cache client for hot-swapping) to make service-to-service calls use JSON too, so you can capture and analyze the request and responses more easily

As usual, you can try out the demo code yourself, the repo is available here.