“Guys, we’re doing pagination wrong…”

These are the words that I had to mutter quite a few times in my career, at the dissatisfaction of how pagination had been implemented on several projects.

Still, that dissatisfaction is nothing compared to how I feel when I occasionally had to ask “why is this API not paginated..?”

So, taking a break from my usual Serverless ramblings, let’s talk about pagination :-)

Unidirectional and Bidirectional Pagination

Generally speaking, I see two common types of paginations:

  • simple, unidirectional, paging through a static set of results that are too long or inefficient to return in one go – e.g. list of twitter followers, or list of Google search results
  • bidirectional paging through a feed or stream of some sorts where new results can be added after you received the first page of results – e.g. your twitter timeline, or notifications

Avoid leaky abstraction

A common mistake I see is that the paginated API requires the caller to provide the “key” it uses to sort through the results, which creates a leaky abstraction. The caller must then understand the underlying mechanism the service uses to page its results – e.g. by timestamp, or alphabetical order.

DynamoDB’s Query API is a good example of this. To page through the query results, the caller must specify the ExclusiveStartKey in subsequent requests. However, the service does return the LastEvaluatedKey in the response too.

So, in practice, you can almost treat the LastEvaluatedKey as a token, or cursor, which you simply pass on in the next request. Except, it’s not just a token, it’s an actual sort key in the DynamoDB table, and the attribute names already gave the implementation details away anyway.

On its own, this is not a big deal. However, it often has the unfortunately knock-on effect of encouraging application developers to build their application level paginations on top of this implementation detail. Except this time around, they’re not returning the LastEvaluatedKey in the response and the client is now responsible for tracking that piece of information.

Congratulations, the underlying mechanic your database uses to support pagination has now leaked all the way to your front end!

Make paging intent explicit and consistent

Another common trend I see is that you have to send the same request parameters to the paginated API over and over, for example:

  • max no. of results per page
  • the direction of pagination (if bidirectional)
  • the original query (in DynamoDB’s case, this includes a no. of attributes such as FilterExpressionKeyConditionExpressionProjectionExpression and IndexName)

I won’t call this one a mistake as it is sometimes by design, but more often than not it strikes me as a consequence of lack of design instead.

In all the paginated APIs I have encountered, the intended behaviour is always to fetch the next set of results for a query, not to start a different query midway through. That just wouldn’t make sense, and you probably can’t even call that pagination, more like navigation! I mean, when was the last time you start a DynamoDB query, and then have to change any of the request parameters midway through paginating through the results?

That said, there are legitimate reasons for changing the direction of pagination from a previously received page. More on this when we discuss bidirectional paging further down the article.

Unidirectional paging with cursor

For unidirectional pagination, my preferred approach is to use a simple cursor. The important detail here is to make the cursor meaningless.

As far as the client is concerned, it’s just a blob the server returns in the response when there are more results to fetch. The client shouldn’t be able to derive any implementation details from it, and the only thing it can afford to do with this cursor is to send it along in the next request.

fig. 1 – flow of request & responses for a series of paginated requests

But how does the API know where to start fetching the next page from?

A simple way to do this is:

  1. create a JSON object to capture the data needed to fetch next page – e.g. if you’re using DynamoDB, then this can be the request object for the next page (including the ExclusiveStartKey)
  2. base64 encode the JSON string
  3. return the base64 blob as cursor

When we receive the request to fetch the next page, we can apply the reverse process to get back the request object we created earlier.

Isn’t that gonna leak even more information – e.g. that you’re using DynamoDB, the table name, the schema, etc. – if someone just base64 decode your blob?

Absolutely, which is why you might also choose to encrypt the JSON first. You also don’t have to use the DynamoDB query request as base.

Notice in both fig. 1 and fig. 2 the client only sends the cursor in the subsequent requests?

This is by design.

As I mentioned earlier, the client has already told us the query in the first request, the pagination mechanism should only provide a way for fetching subsequent pages of the results.

Which, to me, means it should not afford any other behaviour (there is that word again, read here to see how the idea of affordance applies to API design) and therefore do not require any other information besides the cursor from the previous response.

This in turn, means we need to capture the original query, or intent, in the cursor so we can construct the corresponding DynamoDB request. Or, we could just capture the actual DynamoDB request in the cursor which seems like a simple, practical solution here.

fig. 2 – interaction between client, API and DynamoDB

Bidirectional paging with cursor(s)

With bidirectional paging, you need to be able to page forward in time (when new tweets are added to your timeline) as well as backward (to fetch older tweets). So a simple string cursor would no longer suffice, instead we need 2 cursors, one for each direction, for example…

  "before": "ThlNjc5MjUwNDMzMA...",
  "after": "ADfaU5ODFmMWRiYQ..." 

Additionally, when paging forward, even when there are no more results right now we still have to return a cursor as new results can be added to the feed later. So we should also include a pair of boolean flags in the response.

  "before": "ThlNjc5MjUwNDMzMA...",
  "hasBefore": true,
  "after": "ADfaU5ODFmMWRiYQ...",
  "hasAfter": true

When the client pages forward in time and receives hasAfter as false then it knows there are no more results available right now. It can therefore stop actively fetch the next page of results, and be more passive and only poll for new results periodically.

Let’s run through a simple example, imagine if you’re fetching the tweets in your timeline where the API would return the latest tweets first.

fig. 3 – paginate backward in time to fetch older data

  1. the client makes a first request
  2. API responds with a cursor object, hasAfter is false because the API has responded with the latest results, but hasBefore is true as there are older results available
  3. the client makes a second request and passes only the before cursor in the request, making its intention clear and unambiguous
  4. API responds with another cursor object where this time both hasBeforeand hasAfter are true given that we’re right in the middle of this stream of results
  5. the client makes a third and last request, again passing only the beforecursor received from the previous response
  6. API responds with a cursor object where hasBefore is false because we have now received the oldest result available

Ok, now let’s run through another example, this time we’ll page forward in time instead.

fig. 4 – paginate forward in time to fetch newer data

  1. the client makes a first request
  2. API responds with a cursor object, hasAfter is false because the API has responded with the latest results, but hasBefore is true as there are older results available
  3. some time has passed, and more results have become available
  4. the client makes a second request, and passes only the after cursor in the request, making its intention clear and unambiguous
  5. API responds with only the newer results that the client has not received already, and cursor.hasAfter is false as these are the latest results available at this moment in time; should the client page backward (in time) from this response then it’ll receive the same results as the first response from the API

Now, let’s circle back to what I mentioned earlier regarding the occasional need to change direction midway through pagination.

The reason we need pagination is because it’s often impractical, inefficient and in some cases impossible to return all available results for a query – e.g. at the time of writing Katy Perry has 108M Twitter followers, trying to retrieve all her followers in one request-response cycle would have crashed both the server and the client app.

Besides limiting how much data can be returned in one request-response cycle, we also need to place a upper bound on how much data the client app would cache in order to protect the user experience and prevent the client app from crashing.

That means, at some point, as the user keeps scrolling through older tweets, the client would need to start dropping data it has already fetched or risk running out of memory. Which means, when the user scrolls back up to see the latest tweets, the client would need to re-fetch pages that had been dropped and hence reversing the original direction of the pagination.

Fortunately, the scheme outlined above is flexible enough and allows you to do just that. Every page of results has an associated cursor that allows you to fetch the next page in either direction. So, in the case where you need to re-fetch a dropped page, it’s as simple as making a paginated request with the after cursor of the latest page you have cached.

Dealing with “gaps”

Staying with the Twitter example. If you open up the Twitter mobile app after some time, you’ll see the tweets that has already been cached but the app also recognizes that a lot of time has passed that it’s not feasible to paginate from the cached data all the way to the latest.

Instead, the client would fetch the latest tweets with a non-paginated request. As you scroll down, the client can automatically fetch older pages as per fig. 3 and gradually fill in the gap until it joins up with the cached data.

The behaviour of the Twitter mobile app has changed over time, and another tactic I have seen is to place a visual (clickable) marker for the missing tweets in the timeline. This makes it an explicit action by the user to start paging through older tweets to fill in the gap.

So there you have it, a simple and effective way to implement both unidirectional and bidirectional paginated APIs, hope you have found it useful!

AWS Lambda – monolithic functions won’t help you with cold starts

After my post on monolithic functions vs single-purposed functions, a few people asked me about the effect monolithic functions have on cold starts, so I thought I’d share my thoughts here.

The question goes something like this:

Monolithic functions are invoked more frequently so they are less likely to be in cold state, while single-purposed functions that are not being used frequently may always be cold state, don’t you think?

That seems like a fair assumption, but the actual behaviour of cold starts is a more nuanced discussion and can have drastically different results depending on traffic pattern. Check out my other post that goes into this behaviour in more detail.

The effect of consolidation into monolithic functions (on the no. of cold starts experienced) quickly diminishes with load

To simplify things, let’s consider “the number of cold starts you’ll have experienced as you ramp up to X req/s”. Assuming that:

  • the ramp up was gradual so there was no massive spikes (which could trigger a lot more cold starts)
  • each request’s duration is short, say, 100ms

At a small scale, say, 1 req/s per endpoint, and a total of 10 endpoints (which is 1 monolithic function vs 10 single purposed functions) we’ll have a total of 10 req/s. Given the 100ms execution time, it’s just within what one concurrent function is able to handle.

To reach 1 req/s per endpoint, you will have experienced:

  • monolithic: 1 cold start
  • single-purposed: 10 cold starts

As the load goes up, to 100 req/s per endpoint, which equates to a total of 1000 req/s. To handle this load you’ll need at least 100 concurrent executions of the monolithic function (100ms per req, so the throughput per concurrent execution is 10 req/s, hence concurrent executions = 1000 / 10 = 100). To reach this level of concurrency, you will have experienced:

  • monolithic: 100 cold starts

At this point, 100 req/s per endpoint = 10 concurrent executions for each of the single-purposed functions. To reach that level of concurrency, you will also have experienced:

  • single-purposed: 10 concurrent execs * 10 functions = 100 cold starts

So, monolithic functions don’t help you with the no. of cold starts you’ll experience even at a moderate amount of load.

Also, when the load is low, there are simple things you can do to mitigate cold starts by pre-warming your functions (as discussed in the other post). You can even use the serverless-plugin-warmup to do that for you, and it even comes with the option to do a pre-warmup run after a deployment.

However, this practice stops being effective when you have even a moderate amount of concurrency. At which point, monolithic functions would incur just as many cold starts as single-purposed functions.

Consolidating into monolithic functions can increase initialization time, which increases the duration of cold start

By packing more “actions” into one function, we also increase the no. of modules that need to be initialized during the cold start of that function, and are therefore highly to experience longer cold starts as a result (basically, anything outside of the exported handler function is initialized during the Bootstrap runtime phase (see below) of the cold start.

from Ajay Nair’s talk at re:invent 2017?—?https://www.youtube.com/watch?v=oQFORsso2go

Imagine in the monolithic version of the fictional user-api I used to illustrate the point in this post, our handler module would need to require all the dependencies used by all the endpoints.

const depA = require('lodash');
const depB = require('facebook-node-sdk');
const depC = require('aws-sdk');

Whereas in the single-purposed version of the user-api, only the get-user-by-facebook-id endpoint’s handler function would need to incur the extra overhead of initializing the facebook-node-sdk dependency during cold start.

You also have to factor in any other modules in the same project, and their dependencies, and any code that will be run during those modules’ initialization, and so on.

Wrong place to optimize cold start

So, contrary to one’s intuition, monolithic functions don’t offer any benefit towards cold starts outside what basic prewarming can achieve already, and can quite likely extend the duration of cold starts.

Since cold start affects you wildly differently depending on language, memory and how much initialization you’re doing in your code. I’ll argue that, if cold starts is a concern for you, then you’re far better off switching to another language (i.e. Go, Node.js or Python) and to invest effort into optimizing your code so it suffers shorter cold starts.

Also, keep in mind that this is something that AWS and other providers are actively working on and I suspect the situation will be vastly improved in the future by the platform.

All and all, I think changing the deployment units (one big function vs many small functions) is not the right way to address cold starts.

Serverless observability brings new challenges to current practices

This is a the first in a mini 3-part series that accompanies my “the present and future of Serverless observability” talk at ServerlessConf Paris and QCon London this year.

part 1 : new challenges to observability

part 2 : the present of Serverless observability

part 3 : the future of Serverless observability

2017 was no doubt the year the concept of observability became mainstream, so much so we now have an entire Observability track at a big industry event such as QCon.

This is no doubt thanks to the excellent writing and talks by some really smart people like Cindy Sridharan and Chairty Majors:

As Cindy mentioned in her post though, the first murmurs of observabilitycame from a post by Twitter way back in 2013 where they discussed many of the challenges they face to debug their complex, distributed system.

A few years later, Netflix started writing about the related idea of intuition engineering around how do we design tools that can give us holistic understanding of our complex system – that is, how can we design our tools so that they present us the most relevant information about our system, at the right time, and minimize the amount of time and cognitive energy we need to invest to build a correct mental model of our system.

Challenges with Serverless observability

With Serverless technologies like AWS Lambda, we face a number of new challenges to the practices and tools that we have slowly mastered as we learnt how to gain observability for services running inside virtual machines as well as containers.

As a start, we lose access to the underlying infrastructure that runs our code. The execution environment is locked down, and we have nowhere to install agents & daemons for collecting, batching and publishing data to our observability system.

These agents & daemons used to go about their job quietly in the background, far away from the critical paths of your code. For example, if you’re collecting metrics & logs for your REST API, you would collect and publish these observability data outside the request handling code where a human user is waiting for a response on the other side of the network.

But with Lambda, everything you do has to be done inside your function’s invocation, which means you lose the ability to perform background processing. Except what the platform does for you, such as:

  • collecting logs from stdout and sending them to CloudWatch Logs
  • collecting tracing data and sending them to X-Ray

Another aspect that has drastically changed, is how concurrency of our system is controlled.

Whereas before, we would write our REST API with a web framework, and we’ll run it as an application inside an EC2 server or a container, and this application would handle many concurrent requests. In fact, one of the things we compare different web frameworks with, is their ability to handle large no. of concurrent requests.

Now, we don’t need web frameworks to create a scalable REST API anymore, API Gateway and Lambda takes care of all the hard work for us. Concurrency is now managed by the platform, and that’s great news!

However, this also means that any attempt to batch observability data becomes less effective (more on this later), and for the same volume of incoming traffic you’ll exert a much higher volume of traffic to your observability system. This in turn can have a non-trivial performance and cost implications at scale.

You might argue that “well, in that case, I’ll just use a bigger batch size for these observability data and publish them less frequently so I don’t overwhelm the observability system”.

Except, it’s not that simple, enter, the lifecycle of an AWS Lambda function.

One of the benefits of Lambda is that you don’t pay for it if you don’t use it. To achieve that, the Lambda service would garbage collect containers (or, concurrent executions of your function) that have not received a request for some time. I did some experiments to see how long that idle time is, which you can read about in this post.

And if you have observability data that have not been published yet, then you’ll loss those data when the container is GC’d.

Even if the container is continuously receiving requests, maybe with the help of something like the warmup plugin for the Serverless framework, the Lambda service would still GC the container after it has been active for a few hours and replace it with a fresh container.

Again, this is a good thing, as it eliminates common problems with long running code, such as memory fragmentation and so on. But it also means, you can still lose unpublished observability data when it happens.

Also, as I explained in a previous post on cold starts, those attempts to keep containers warm stop being effective when you have even a moderate amount of load against your system.

So, you’re back to sending observability data eagerly. Maybe this time, you’ll build an observability system that can handle this extra load, maybe you’ll build it using Lambda!

But wait, remember, you don’t have background processing time anymore…

So if you’re sending observability data eagerly as part of your function invocation, then that means you’re hurting the user-facing latency and we know that latency affects business revenue directly (well, at least in any reasonably competitive market where there’s another provider the customer can easily switch to).

Talk about being caught between a rock and a hard place…

Animated GIF - Find & Share on GIPHY

Finally, one of the trends that I see in the Serverless space – and one that I have experienced myself when I migrated a social network’s architecture to AWS Lambda – is how powerful, and how simple it is to build an event-driven architecture. And Randy Shoup seems to think so too.

And in this event-driven, serverless world, function invocations are often chained through some asynchronous event source such as Kinesis Streams, SNS, S3, IoT, DynamoDB Streams, and so on.

In fact, of all the supported event sources for AWS Lambda, only a few are classified as synchronous, so by design, the cards are stacked towards asynchrony here.

And guess what, tracing asynchronous invocations is hard.

I wrote a post on how you might do it yourself, in the interest of collecting and forwarding correlation IDs for distributed tracing. But even with the approach I outlined, it won’t be easy (or in some cases, possible) to trace through every type of event source.

X-Ray doesn’t help you here either, although it sounds like they’re at least looking at support for Kinesis. At the time of writing, X-Ray also doesn’t trace over API Gateway but that too, is on their list.

Until next time…

So, I hope I have painted a clear picture of what tool vendors are up against in this space, so you really gotta respect the work people like IOPipeDashbird and Thundra has done.

That said, there are also many things you have to consider yourself.

For example, given the lack of background processing, when you’re building a user facing API where latency is important, you might want to avoid using an observability tool that doesn’t give you the option to send observability data asynchronously (for example, by leveraging CloudWatch Logs), or you need to use a rather stringent sample rate.

At the same time, when you’re processing events asynchronously, you don’t have to worry about invocation time quite as much. But you might care about the cost of writing so much data to CloudWatch Logs and the subsequent Lambda invocations to process them. Or maybe you’re concerned that low-priority functions (that process the observability data you send via CloudWatch Logs) are eating up your quota of concurrent executions and can throttle high-priority functions (like the ones that serves user-facing REST APIs!). In this case, you might choose to publish observability data eagerly at the end of each invocation.

In part 2, we’ll look at some of the existing observability tools available to us, and how these aforementioned challenges should affect the way we evaluate them and when we should use them.

AWS Lambda – how best to manage shared code and shared infrastructure

In the last post I discussed the pros & cons of following the Single Responsibility Principle (SRP) when moving to the serverless paradigm.

One of the questions that popped up on both Twitter and Medium is “how do you deal with shared code?”. It is a FAQ whenever I speak at user groups or conferences about AWS Lambda, alongside “how do you deal with shared infrastructure that doesn’t clearly belong to any one particular service?”

So here are my thoughts on these two questions.

Again, I’m not looking to convince you one way or the other and I don’t believe there’s a “right” answer that’d work for everyone. This is simply me talking out loud and sharing my internal thought process with you and hopefully get you asking the same questions of your own architecture.

As ever, if you disagree with my assessment or find flaws in my thinking, please let me know via the comments section below.

As you build out your system with all these little Lambda functions, there are no doubt going to be business logic, or utility code that you want to share and reuse amongst your Lambda functions.

When you have a group of functions that are highly cohesive and are organised into the same repo – like the functions that we created to implement the timeline feature in the Yubl app – then sharing code is easy, you just do it via a module inside the repo.

But to share code more generally between functions across the service boundary, it can be done through shared libraries, perhaps published as private NPM packages so they’re only available to your team.

Or, you can share business logic by encapsulating them into a service, and there are couple of considerations you should make in choosing which approach to use.

Shared library vs Service


When you depend on a shared library, that dependency is declared explicitly, in the case of Node.js, this dependency is declared in the package.json.

When you depend on a service, that dependency is often not declared at all, and may be discovered only through logging, and perhaps explicit attempts at tracing, maybe using the AWS X-Ray service.


When it comes to deploying updates to these shared code, with shared library, you can publish a new version but you still have to rely on the consumers of your shared library to update.

Whereas with a service, you as the owner of the service has the power to decide when to deploy the update, and you can even use techniques such as canary deployment or feature flags to roll out updates in a controlled, and safe manner.


With libraries, you will have multiple active versions at the same time (as discussed above) depending on the upgrade and deployment schedule of the consumers. In fact, there’s really no way to avoid it entirely, not even with the best efforts at a coordinated update, there will be a period of time where there are multiple versions active at once.

With services, you have a lot more control, and you might choose to run multiple versions at the same time. This can be done via canary deployment or to run multiple versions side-by-side by maybe putting the version of the API in the URL, as people often do.

There are multiple ways to version an API, but I don’t find any of them to be satisfactory. Sebastien Lambla did a good talk on this topic and went through several of these approaches and why they’re all bad, so check out his talk if you want to learn more about the perils of API versioning.

backward compatibility

With a shared library, you can communicate backward compatibility of updates using semantic versioning – where a MAJOR version update signifies a breaking change. If you follow semantic versioning with your releases then backward compatibility can be broken in a controlled, well communicated manner.

Most package managers support semantic versioning by allowing the user to decide whether automatic updates should increment to the next MAJOR or MINOR version.

With a service, if you roll out a breaking change then it’ll break anyone that depends on your service. This is where it ties back to versioning again, and as I already said, none of the approaches that are commonly practiced feel satisfactory to me. I have had this discussion with my teams many times in the past, and they always ended with the decision to “always maintain backward compatibility” as a general rule, unless the circumstances dictate that we have to break the rule and do something special.


With a shared library, you generally expose more than you need, especially if it’s for internal use. And even if you have taken the care to consider what should be part of the library’s public API, there’s always a way for the consumer to get at those internal APIs via reflection.

With a service, you are much more considerate towards what to expose via the service’s public API. You have to, because anything you share via the service’s public API is an explicit design decision that requires effort.

The internal workings of that service is also hidden from the consumer, and there’s no direct (and easy) way for the consumers to access them so there’s less risk of the consumers of our shared code accidentally depending on internal implementation details. The worst thing that can happen here is if the consumers of your API start to depend on those (accidentally leaked) implementation details as features…


When a library fails, your code fails, and it’s often loud & clear and you get the stack trace of what went wrong.

With a service, it may fail, or maybe it just didn’t respond in time before you stop waiting for the response. As a consumer you often can’t distinguish between a service being down, from it being slow. When that happens, retries can become tricky as well if the actions you’re trying to perform would modify state and that the action is not idempotent.

Partial failures are also very difficult to deal with, and often requires elaborate patterns like the Saga pattern in order to rollback state changes that have already been introduced in the transaction.


Finally, and this is perhaps the most obvious, that calling a service introduces network latency, which is significantly higher than calling a method or function in a library.

Managing shared infrastructure

Another question that I get a lot is “how do you manage shared AWS resources like DynamoDB tables and Kinesis streams?”.

If you’re using the Serverless framework then you can manage these directly in your serverless.yml files, and add them as additional CloudFormationresources like below.

This is actually the approach I took in my video course AWS Lambda in Motion, but that is because the demo app I’m leading the students to build is a project with well defined start and end state.

But what works in a project like that won’t necessary work when you’re building a product that will evolve continuously over time. In the context of building a product, there are some problems with this approach.

"sls remove" can delete user data

Since these resources are tied to the CloudFormation stack for the Serverless(as in, the framework) project, if you ever decide to delete the functions using the sls remove command then you’ll delete those resources too, along with any user data that you have in those resources.

Even if you don’t intentionally run sls remove against production, the thought that someone might one day accidentally do it is worrisome enough.

It’s one thing losing the functions if that happens, and experience a down time in the system, it’s quite another to lose all the production user data along with the functions and potentially find yourself in a situation that you can’t easily recover from…

You can – and you should – lock down IAM permissions so that developers can’t accidentally delete these resources in production, which goes a long way to mitigate against these accidents.

You should also leverage the new backup capability that DynamoDB offers.

For Kinesis streams, you should back up the source events in S3 using Kinesis Firehose. That way, you don’t even have to write any backup code yourself!

But even with all the backup options available, I still feel uneasy at the thought of tying these resources that store user data with the creation and deletion of the compute layer (i.e. the Lambda functions) that utilises them.

when ownership is not clear cut

The second problem with managing shared infrastructure in the serverless.yml is that, what do you do when the ownership of these resources is not clear cut?

By virtue of being shared resources, it’s not always clear which project should be responsible for managing these resources in its serverless.yml.

Kinesis streams for example, can consume events from Lambda functions, applications running in EC2 or in your own datacenter. And since it uses a polling model, you can process Kinesis events using Lambda functions, or consumer applications running on EC2 and your own data centres.

They (Kinesis streams) exist as a way for you to notify others of events that has occurred in your system, and modern distributed systems are heterogeneous by design to allow for greater flexibility and the ability to choose the right tradeoff in different circumstances.

Even without the complex case of multi-producer and multi-consumer Kinesis streams, the basic question of “should the consumer or the producer own the stream” is often enough to stop us in our tracks as there doesn’t (at least not to me) seem to be a clear winner here.

Manage shared AWS resources separately

One of the better ways I have seen – and have adopted myself – is to manage these shared AWS resources in a separate repository using either Terraform or CloudFormation templates depending on the expertise available in the team.

This seems to be the approach that many companies have adopted once their serverless architecture matured to the point where shared infrastructure starts to pop up.

But, on its own, it’s probably not even a good way as it introduces other problems around your workflow.

For example. if those shared resources are managed by a separate infrastructure team, then it can create bottlenecks and frictions between your development and infrastructure teams.

That said, by comparison I still think it’s better than managing those shared AWS resources with your serverless.yml for the reasons I mentioned.

If you know of other ways to manage shared infrastructure, then by all means let me know in the comments, or you can get in touch with me via twitter.

AWS Lambda – should you have few monolithic functions or many single-purposed functions?

A funny moment (at 38:50) happened during Tim Bray’s session (SRV306) at re:invent 2017, when he asked the audience if we should have many simple, single-purposed functions, or fewer monolithic functions, and the room was pretty much split in half.

Having been brought up on the SOLID principles, and especially the single responsibility principle (SRP), this was a moment that challenged my belief that following the SRP in the serverless world is a no-brainer.

That prompted this closer examination of the arguments from both sides.

Full disclosure, I am biased in this debate. If you find flaws in my thinking, or simply disagree with my views, please point them out in the comments.

What is a monolithic function?

By “monolithic functions”, I meant functions that have internal branching logic based on the invocation event and can do one of several things.

For example, you can have one function handle several HTTP endpoints and methods and perform a different actions based on path and method.

module.exports.handler = (event, context, cb) => {
  const path = event.path;
  const method = event.httpMethod;
  if (path === '/user' && method === 'GET') {
    .. // get user
  } else if (path === '/user' && method === 'DELETE') {
    .. // delete user
  } else if (path === '/user' && method === 'POST') {
    .. // create user
  } else if ... // other endpoints & methods

What is the real problem?

One can’t rationally reason about and compare solutions without first understanding the problem and what qualities are most desired in a solution.

And when I hear complaints such as:

having so many functions is hard to manage

I immediately wonder what does manage entail? Is it to find specific functions you’re looking for? Is it to discover what functions you have? Does this become a problem when you have 10 functions or 100 functions? Or does it become a problem only when you have more developers working on them than you’re able to keep track of?

Drawing from my own experiences, the problem we’re dealing with has less to do with what functions we have, but rather, what features and capabilities do we possess through these functions.

After all, a Lambda function, like a Docker container, or an EC2 server, is just a conduit to deliver some business feature or capability you require.

You wouldn’t ask “Do we have a get-user-by-facebook-id function?” since you will need to know what the function is called without even knowing if the capability exists and if it’s captured by a Lambda function. Instead, you would probably ask instead “Do we have a Lambda function that can find a user based on his/her facebook ID?”.

So the real problem is that, given that we have a complex system that consists of many features and capabilities, that is maintained by many teams of developers, how do we organize these features and capabilities into Lambda functions so that it’s optimised towards..

  • discoverability: how do I find out what features and capabilities exist in our system already, and through which functions?
  • debugging: how do I quickly identify and locate the code I need to look at to debug a problem? e.g. there are errors in system X’s logs, where do I find the relevant code to start debugging the system?
  • scaling the team: how do I minimise friction and allow me to grow the engineering team?

These are the qualities that are most important to me. With this knowledge, I can compare the 2 approaches and see which is best suited for me.

You might care about different qualities, for example, you might not care about scaling the team, but you really worry about the cost for running your serverless architecture. Whatever it might be, I think it’s always helpful to make those design goals explicit, and make sure they’re shared with and understood (maybe even agreed upon!) by your team.


Discoverability is by no means a new problem, according to Simon Wardley, it’s rather rampant in both government as well as the private sector, with most organisations lacking a systematic way for teams to share and discover each other’s work.

As mentioned earlier, what’s important here is the ability to find out what capabilities are available through your functions, rather than what functions are there.

An argument I often hear for monolithic functions, is that it reduces the no. of functions, which makes them easier to manage.

On the surface, this seems to make sense. But the more I think about it the more it strikes me that the no. of function would only be an impediment to our ability to manage our Lambda functions IF we try to manage them by hand rather than using the tools available to us already.

After all, if we are able to locate books by their content (“find me books on the subject of X”) in a huge physical space with 10s of thousands of books, how can we struggle to find Lambda functions when there are so many tools available to us?

With a simple naming convention, like the one that the Serverless framework enforces, we can quickly find related functions by prefix.

For example, if I want to find all the functions that are part of our user API, I can do that by searching for user-api.

With tagging, we can also catalogue functions across multiple dimensions, such as environment, feature name, what type of event source, the name of the author, and so on.

By default, the Serverless framework adds the STAGE tag to all of your functions. You can also add your own tags as well, see documentation on how to add tags. The Lambda management console also gives you a handy dropdown list of the available values when you try to search by a tag.

If you have a rough idea of what you’re looking for, then the no. of functions is not going to be an impediment to your ability to discover what’s there.

On the other hand, the capabilities of the user-api is immediately obvious with single-purposed functions, where I can see from the relevant functions that I have the basic CRUD capabilities because there are corresponding functions for each.

With a monolithic function, however, it’s not immediately obvious, and I’ll have to either look at the code myself, or have to consult with the author of the function, which for me, makes for pretty poor discoverability.

Because of this, I will mark the monolithic approach down on discoverability.

Having more functions though, means there are more pages for you to scroll through if you just want to explore what functions are there rather than looking for anything specific.

Although, in my experience, with all the functions nicely clustered together by name prefix thanks to the naming convention the Serverless framework enforces, it’s actually quite nice to see what each group of functions can do rather than having to guess what goes on inside a monolithic function.

But, I guess it can be a pain to scroll through everything when you have thousands of functions. So, I’m going to mark single-purposed functions down only slightly for that. I think at that level of complexity, even if you reduce the no. of functions by packing more capabilities into each function, you will still suffer more from not being able to know the true capabilities of those monolithic functions at a glance.


In terms of debugging, the relevant question here is whether or not having fewer functions makes it easier to quickly identify and locate the code you need to look at to debug a problem.

Based on my experience, the trail of breadcrumbs that leads you from, say, an HTTP error or an error stack trace in the logs, to the relevant function and then the repo is the same regardless whether the function does one thing or many different things.

What will be different, is how you’d find the relevant code inside the repo for the problems you’re investigating.

A monolithic function that has more branching and in general does more things, would understandably take more cognitive effort to comprehend and follow through to the code that is relevant to the problem at hand.

For that, I’ll mark monolithic functions down slightly as well.


One of early arguments that got thrown around for microservices is that it makes scaling easier, but that’s just not the case – if you know how to scale a system, then you can scale a monolith just as easily as you can scale a microservice.

I say that as someone who has built monolithic backend systems for games that had a million daily active users. Supercell, the parent company for my current employer, and creator of top grossing games like Clash of Clans and Clash Royale, have well over 100 million daily active users on their games and their backend systems for these games are all monoliths as well.

Instead, what we have learnt from tech giants like the Amazon, and Netflix, and Google of this world, is that a service oriented style of architecture makes it easier to scale in a different dimension – our engineering team.

This style of architecture allows us to create boundaries within our system, around features and capabilities. In doing so it also allows our engineering teams to scale the complexity of what they build as they can more easily build on top of the work that others have created before them.

Take Google’s Cloud Datastore for example, the engineers working on that service were able to produce a highly sophisticated service by building on top of many layers of services, each provide a power layer of abstractions.

These service boundaries are what gives us that greater division of labour, which allows more engineers to work on the system by giving them areas where they can work in relative isolation. This way, they don’t constantly trip over each other with merge conflicts, and integration problems, and so on.

Michael Nygard also wrote a nice article recently that explains this benefit of boundaries and isolation in terms of how it helps to reduce the overhead of sharing mental models.

“if you have a high coherence penalty and too many people, then the team as a whole moves slower… It’s about reducing the overhead of sharing mental models.”

– Michael Nygard

Having lots of single-purposed functions is perhaps the pinnacle of that division of task, and something you lose a little when you move to monolithic functions. Although in practice, you probably won’t end up having so many developers working on the same project that you feel the pain, unless you really pack them in with those monolithic functions!

Also, restricting a function to doing just one thing also helps limit how complex a function can become. To make something more complex you would instead compose these simple functions together via other means, such as with AWS Step Functions.

Once again, I’m going to mark monolithic functions down for losing some of that division of labour, and raising the complexity ceiling of a function.


As you can see, based on the criteria that are important to me, having many single-purposed functions is clearly the better way to go.

Like everyone else, I come preloaded with a set of predispositions and biases formed from my experiences, which quite likely do not reflect yours. I’m not asking you to agree with me, but to simply appreciate the process of working out the things that are important to you and your organization, and how to go about finding the right approach for you.

However, if you disagree with my line of thinking and the arguments I put forward for my selection criteria – discoverability, debugging, and scaling the team & complexity of the system – then please let me know via comments.