“Guys, we’re doing pagination wrong…”

These are the words that I had to mutter quite a few times in my career, at the dissatisfaction of how pagination had been implemented on several projects.

Still, that dissatisfaction is nothing compared to how I feel when I occasionally had to ask “why is this API not paginated..?”

So, taking a break from my usual Serverless ramblings, let’s talk about pagination :-)

Unidirectional and Bidirectional Pagination

Generally speaking, I see two common types of paginations:

  • simple, unidirectional, paging through a static set of results that are too long or inefficient to return in one go – e.g. list of twitter followers, or list of Google search results
  • bidirectional paging through a feed or stream of some sorts where new results can be added after you received the first page of results – e.g. your twitter timeline, or notifications

Avoid leaky abstraction

A common mistake I see is that the paginated API requires the caller to provide the “key” it uses to sort through the results, which creates a leaky abstraction. The caller must then understand the underlying mechanism the service uses to page its results – e.g. by timestamp, or alphabetical order.

DynamoDB’s Query API is a good example of this. To page through the query results, the caller must specify the ExclusiveStartKey in subsequent requests. However, the service does return the LastEvaluatedKey in the response too.

So, in practice, you can almost treat the LastEvaluatedKey as a token, or cursor, which you simply pass on in the next request. Except, it’s not just a token, it’s an actual sort key in the DynamoDB table, and the attribute names already gave the implementation details away anyway.

On its own, this is not a big deal. However, it often has the unfortunately knock-on effect of encouraging application developers to build their application level paginations on top of this implementation detail. Except this time around, they’re not returning the LastEvaluatedKey in the response and the client is now responsible for tracking that piece of information.

Congratulations, the underlying mechanic your database uses to support pagination has now leaked all the way to your front end!

Make paging intent explicit and consistent

Another common trend I see is that you have to send the same request parameters to the paginated API over and over, for example:

  • max no. of results per page
  • the direction of pagination (if bidirectional)
  • the original query (in DynamoDB’s case, this includes a no. of attributes such as FilterExpressionKeyConditionExpressionProjectionExpression and IndexName)

I won’t call this one a mistake as it is sometimes by design, but more often than not it strikes me as a consequence of lack of design instead.

In all the paginated APIs I have encountered, the intended behaviour is always to fetch the next set of results for a query, not to start a different query midway through. That just wouldn’t make sense, and you probably can’t even call that pagination, more like navigation! I mean, when was the last time you start a DynamoDB query, and then have to change any of the request parameters midway through paginating through the results?

That said, there are legitimate reasons for changing the direction of pagination from a previously received page. More on this when we discuss bidirectional paging further down the article.

Unidirectional paging with cursor

For unidirectional pagination, my preferred approach is to use a simple cursor. The important detail here is to make the cursor meaningless.

As far as the client is concerned, it’s just a blob the server returns in the response when there are more results to fetch. The client shouldn’t be able to derive any implementation details from it, and the only thing it can afford to do with this cursor is to send it along in the next request.

fig. 1 – flow of request & responses for a series of paginated requests

But how does the API know where to start fetching the next page from?

A simple way to do this is:

  1. create a JSON object to capture the data needed to fetch next page – e.g. if you’re using DynamoDB, then this can be the request object for the next page (including the ExclusiveStartKey)
  2. base64 encode the JSON string
  3. return the base64 blob as cursor

When we receive the request to fetch the next page, we can apply the reverse process to get back the request object we created earlier.

Isn’t that gonna leak even more information – e.g. that you’re using DynamoDB, the table name, the schema, etc. – if someone just base64 decode your blob?

Absolutely, which is why you might also choose to encrypt the JSON first. You also don’t have to use the DynamoDB query request as base.

Notice in both fig. 1 and fig. 2 the client only sends the cursor in the subsequent requests?

This is by design.

As I mentioned earlier, the client has already told us the query in the first request, the pagination mechanism should only provide a way for fetching subsequent pages of the results.

Which, to me, means it should not afford any other behaviour (there is that word again, read here to see how the idea of affordance applies to API design) and therefore do not require any other information besides the cursor from the previous response.

This in turn, means we need to capture the original query, or intent, in the cursor so we can construct the corresponding DynamoDB request. Or, we could just capture the actual DynamoDB request in the cursor which seems like a simple, practical solution here.

fig. 2 – interaction between client, API and DynamoDB

Bidirectional paging with cursor(s)

With bidirectional paging, you need to be able to page forward in time (when new tweets are added to your timeline) as well as backward (to fetch older tweets). So a simple string cursor would no longer suffice, instead we need 2 cursors, one for each direction, for example…

{
  "before": "ThlNjc5MjUwNDMzMA...",
  "after": "ADfaU5ODFmMWRiYQ..." 
}

Additionally, when paging forward, even when there are no more results right now we still have to return a cursor as new results can be added to the feed later. So we should also include a pair of boolean flags in the response.

{
  "before": "ThlNjc5MjUwNDMzMA...",
  "hasBefore": true,
  "after": "ADfaU5ODFmMWRiYQ...",
  "hasAfter": true
}

When the client pages forward in time and receives hasAfter as false then it knows there are no more results available right now. It can therefore stop actively fetch the next page of results, and be more passive and only poll for new results periodically.


Let’s run through a simple example, imagine if you’re fetching the tweets in your timeline where the API would return the latest tweets first.

fig. 3 – paginate backward in time to fetch older data

  1. the client makes a first request
  2. API responds with a cursor object, hasAfter is false because the API has responded with the latest results, but hasBefore is true as there are older results available
  3. the client makes a second request and passes only the before cursor in the request, making its intention clear and unambiguous
  4. API responds with another cursor object where this time both hasBeforeand hasAfter are true given that we’re right in the middle of this stream of results
  5. the client makes a third and last request, again passing only the beforecursor received from the previous response
  6. API responds with a cursor object where hasBefore is false because we have now received the oldest result available

Ok, now let’s run through another example, this time we’ll page forward in time instead.

fig. 4 – paginate forward in time to fetch newer data

  1. the client makes a first request
  2. API responds with a cursor object, hasAfter is false because the API has responded with the latest results, but hasBefore is true as there are older results available
  3. some time has passed, and more results have become available
  4. the client makes a second request, and passes only the after cursor in the request, making its intention clear and unambiguous
  5. API responds with only the newer results that the client has not received already, and cursor.hasAfter is false as these are the latest results available at this moment in time; should the client page backward (in time) from this response then it’ll receive the same results as the first response from the API

Now, let’s circle back to what I mentioned earlier regarding the occasional need to change direction midway through pagination.

The reason we need pagination is because it’s often impractical, inefficient and in some cases impossible to return all available results for a query – e.g. at the time of writing Katy Perry has 108M Twitter followers, trying to retrieve all her followers in one request-response cycle would have crashed both the server and the client app.

Besides limiting how much data can be returned in one request-response cycle, we also need to place a upper bound on how much data the client app would cache in order to protect the user experience and prevent the client app from crashing.

That means, at some point, as the user keeps scrolling through older tweets, the client would need to start dropping data it has already fetched or risk running out of memory. Which means, when the user scrolls back up to see the latest tweets, the client would need to re-fetch pages that had been dropped and hence reversing the original direction of the pagination.

Fortunately, the scheme outlined above is flexible enough and allows you to do just that. Every page of results has an associated cursor that allows you to fetch the next page in either direction. So, in the case where you need to re-fetch a dropped page, it’s as simple as making a paginated request with the after cursor of the latest page you have cached.

Dealing with “gaps”

Staying with the Twitter example. If you open up the Twitter mobile app after some time, you’ll see the tweets that has already been cached but the app also recognizes that a lot of time has passed that it’s not feasible to paginate from the cached data all the way to the latest.

Instead, the client would fetch the latest tweets with a non-paginated request. As you scroll down, the client can automatically fetch older pages as per fig. 3 and gradually fill in the gap until it joins up with the cached data.

The behaviour of the Twitter mobile app has changed over time, and another tactic I have seen is to place a visual (clickable) marker for the missing tweets in the timeline. This makes it an explicit action by the user to start paging through older tweets to fill in the gap.

So there you have it, a simple and effective way to implement both unidirectional and bidirectional paginated APIs, hope you have found it useful!

The problems with DynamoDB Auto Scaling and how it might be improved

TL;DR – AWS announced the long awaited auto scaling capability for DynamoDB, but we found it takes too long to scale up and doesn’t scale up aggressively enough as it’s held back by using consumed capacity as scaling metric rather than actual request count.

Here at Space Ape Games we developed an in-house tech to auto scale DynamoDB throughput and have used it successfully in production for a few years. It’s even integrated with our LiveOps tooling and scales up our DynamoDB tables according to the schedule of live events. This way, our tables are always provisioned just ahead of that inevitable spike in traffic at the start of an event.

Auto scaling DynamoDB is a common problem for AWS customers, I have personally implemented similar tech to deal with this problem at two previous companies. Whilst at Yubl, I even applied the same technique to auto scale Kinesis streams too.

When AWS announced DynamoDB Auto Scaling we were excited. However, the blog post that accompanied the announcement illustrated two problems:

  • the reaction time to scaling up is slow (10–15 mins)
  • it did not scale sufficiently to maintain the 70% utilization level
Notice the high no. of throttled operations despite the scaling activity. If you were scaling the table manually, would you have settled for this result?

Notice the high no. of throttled operations despite the scaling activity. If you were scaling the table manually, would you have settled for this result?

It looks as though the author’s test did not match the kind of workload that DynamoDB Auto Scaling is designed to accommodate:

In our case, we also have a high write-to-read ratio (typically around 1:1) because every action the players perform in a game changes their state in some way. So unfortunately we can’t use DAX as a get-out-of-jail free card.

How DynamoDB auto scaling works

When you modify the auto scaling settings on a table’s read or write throughput, it automatically creates/updates CloudWatch alarms for that table – four for writes and four for reads.

As you can see from the screenshot below, DynamoDB auto scaling uses CloudWatch alarms to trigger scaling actions. When the consumed capacity units breaches the utilization level on the table (which defaults to 70%) for 5 mins consecutively it will then scale up the corresponding provisioned capacity units.

Problems with the current system, and how it might be improved

From our own tests we found DynamoDB’s lacklustre performance at scaling up is rooted in 2 problems:

  1. The CloudWatch alarms requires 5 consecutive threshold breaches. When you take into account the latency in CloudWatch metrics (which typically are a few mins behind) it means scaling actions occur up to 10 mins after the specified utilization level is first breached. This reaction time is too slow.
  2. The new provisioned capacity unit is calculated based on consumed capacity units rather than the actual request count. The consumed capacity units is itself constrained by the provisioned capacity units even though it’s possible to temporarily exceed the provisioned capacity units with burst capacity. What this means is that once you’ve exhausted the saved burst capacity, the actual request count can start to outpace the consumed capacity units and scaling up is not able to keep pace with the increase in actual request count. We will see the effect of this in the results from the control group later.

Based on these observations, we hypothesize that you can make two modifications to the system to improve its effectiveness:

  1. trigger scaling up after 1 threshold breach instead of 5, which is in-line with the mantra of “scale up early, scale down slowly”.
  2. trigger scaling activity based on actual request count instead of consumed capacity units, and calculate the new provisioned capacity units using actual request count as well.

As part of this experiment, we also prototyped these changes (by hijacking the CloudWatch alarms) to demonstrate their improvement.

Testing Methodology

The most important thing for this test is a reliable and reproducible way of generating the desired traffic patterns.

To do that, we have a recursive function that will make BatchPut requests against the DynamoDB table under test every second. The items per second rate is calculated based on the elapsed time (t) in seconds so it gives us a lot of flexibility to shape the traffic pattern we want.

Since a Lambda function can only run for a max of 5 mins, when context.getRemainingTimeInMillis() is less than 2000 the function will recurse and pass the last recorded elapsed time (t) in the payload for the next invocation.

The result is a continuous, smooth traffic pattern you see below.

We tested with 2 traffic patterns we see regularly.

Bell Curve

This should be a familiar traffic pattern for most – a slow & steady buildup of traffic from the trough to the peak, followed by a faster drop off as users go to sleep. After a period of steady traffic throughout the night things start to pick up again the next day.

For many of us whose user base is concentrated in the North America region, the peak is usually around 3–4am UK time – the more reason we need DynamoDB Auto Scaling to do its job and not wake us up!

This traffic pattern is characterised by a) steady traffic at the trough, b) slow & steady build up towards the peak, c) fast drop off towards the trough, and repeat.

This traffic pattern is characterised by a) steady traffic at the trough, b) slow & steady build up towards the peak, c) fast drop off towards the trough, and repeat.

Top Heavy

This sudden burst of traffic is usually precipitated by an event – a marketing campaign, a promotion by the app store, or in our case a scheduled LiveOpsevent.

In most cases these events are predictable and we scale up DynamoDB tables ahead of time via our automated tooling. However, in the unlikely event of an unplanned burst of traffic (and it has happened to us a few times) a goodauto scaling system should scale up quickly and aggressively to minimise the disruption to our players.

This pattern is characterised by a) sharp climb in traffic, b) a slow & steady decline, c) stay at a stead level until the anomaly finishes and it goes back to the Bell Curve again.

This pattern is characterised by a) sharp climb in traffic, b) a slow & steady decline, c) stay at a stead level until the anomaly finishes and it goes back to the Bell Curve again.

We tested these traffic patterns against several utilization level settings (default is 70%) to see how it handles them. We measured the performance of the system by:

  • the % of successful requests (ie. consumed capacity / request count)
  • the total no. of throttled requests during the test

These results will act as our control group.

We then tested the same traffic patterns against the 2 hypothetical auto scaling changes we proposed above.

To prototype the proposed changes we hijacked the CloudWatch alarms created by DynamoDB auto scaling using CloudWatch events.

When a PutMetricAlarm API call is made, our change_cw_alarm function is invoked and replaces the existing CloudWatch alarms with the relevant changes – ie. set the EvaluationPeriods to 1 minute for hypothesis 1.

To avoid an invocation loop, the Lambda function will only make changes to the CloudWatch alarm if the EvaluationPeriod has not been changed to 1 min already.

To avoid an invocation loop, the Lambda function will only make changes to the CloudWatch alarm if the EvaluationPeriod has not been changed to 1 min already.

The change_cw_alarm function changed the breach threshold for the CloudWatch alarms to 1 min.

The change_cw_alarm function changed the breach threshold for the CloudWatch alarms to 1 min.

For hypothesis 2, we have to take over the responsibility of scaling up the table as we need to calculate the new provisioned capacity units using a custom metric that tracks the actual request count. Hence why the AlarmActions for the CloudWatch alarm is also overridden here.

The SNS topic is subscribed to a Lambda function which scales up the throughput of the table.

The SNS topic is subscribed to a Lambda function which scales up the throughput of the table.

Result (Bell Curve)

The test is setup as following:

  1. table starts off with 50 write capacity unit
  2. traffic holds steady for 15 mins at 25 writes/s
  3. traffic then increases to peak level (300 writes/s) at a steady rate over the next 45 mins
  4. traffic drops off back to 25 writes/s at a steady rate over the next 15 mins
  5. traffic holds steady at 25 writes/s

All the units in the diagrams are of SUM/min, which is how CloudWatch tracks ConsumedWriteCapacityUnits and WriteThrottleEvents, but I had to normalise the ProvisionedWriteCapacityUnits (which is tracked as per second unit) to make them consistent.

Let’s start by seeing how the control group (vanilla DynamoDB auto scaling) performed at different utilization levels from 30% to 80%.

I’m not sure why the total consumed units and total request count metrics didn’t match exactly when the utilization is between 30% and 50%, but seeing as there were no throttled events I’m going to put that difference down to inaccuracies in CloudWatch.

I’m not sure why the total consumed units and total request count metrics didn’t match exactly when the utilization is between 30% and 50%, but seeing as there were no throttled events I’m going to put that difference down to inaccuracies in CloudWatch.

I make several observations from these results:

  1. At 30%-50% utilization levels, write ops are never throttled – this is what we want to see in production.
  2. At 60% utilization level, the slow reaction time (problem 1) caused writes to be throttled early on as the system adjust to the steady increase in load but it was eventually able to adapt.
  3. At 70% and 80% utilization level, things really fell apart. The growth in the actual request count outpaced the growth of consumed capacity units, more and more write ops were throttled as the system failed to adapt to the new level of actual utilization (as opposed to “allowed”utilization measured by consumed capacity units, ie problem 2).

Hypothesis 1 : scaling after 1 min breach

Some observations:

  1. At 30%-50% utilization level, there’s no difference to performance.
  2. At 60% utilization level, the early throttled writes we saw in the control group is now addressed as we decreased the reaction time of the system.
  3. At 70%-80% utilization levels, there is negligible difference in performance. This is to be expected as the poor performance in the control group is caused by problem 2, so improving reaction time alone is unlikely to significantly improve performances in these cases.

Hypothesis 2 : scaling after 1 min breach on actual request count

Scaling on actual request count and using actual request count to calculate the new provisioned capacity units yields amazing results. There were no throttled events at 30%-70% utilization levels.

Even at 80% utilization level both the success rate and total no. of throttled events have improved significantly.

This is an acceptable level of performance for an autoscaling system, one that I’ll be happy to use in a production environment. Although, I’ll still lean on the side of caution and choose a utilization level at or below 70% to give the table enough headroom to deal with sudden spikes in traffic.

Results (Top Heavy)

The test is setup as following:

  1. table starts off with 50 write capacity unit
  2. traffic holds steady for 15 mins at 25 writes/s
  3. traffic then jumps to peak level (300 writes/s) at a steady rate over the next 5 mins
  4. traffic then decreases at a rate of 3 writes/s per minute

Once again, let’s start by looking at the performance of the control group (vanilla DynamoDB auto scaling) at various utilization levels.

Some observations from the results above:

  1. At 30%-60% utilization levels, most of the throttled writes can be attributed to the slow reaction time (problem 1). Once the table started to scale up the no. of throttled writes quickly decreased.
  2. At 70%-80% utilization levels, the system also didn’t scale up aggressively enough (problem 2). Hence we experienced throttled writes for much longer, resulting in a much worse performance overall.

Hypothesis 1 : scaling after 1 min breach

Some observations:

  1. Across the board the performance has improved, especially at the 30%-60% utilization levels.
  2. At 70%-80% utilization levels we’re still seeing the effect of problem 2 – not scaling up aggressively enough. As a result, there’s still a long tail to the throttled write ops.

Hypothesis 2 : scaling after 1 min breach on actual request count

Similar to what we observed with the Bell Curve traffic pattern, this implementation is significantly better at coping with sudden spikes in traffic at all utilization levels tested.

Even at 80% utilization level (which really doesn’t leave you with a lot of head room) an impressive 94% of write operations succeeded (compared with 73% recorded by the control group). Whilst there is still a significant no. of throttled events, it compares favourably against the 500k+ count recorded by the vanilla DynamoDB auto scaling.

Conclusions

I like DynamoDB, and I would like to use its auto scaling capability out of the box but it just doesn’t quite match my expectations at the moment. I hope someone from AWS is reading this, and that this post provides sufficient proof (as you can see from the data below) that it can be vastly improved with relatively small changes.

Feel free to play around with the demo, all the code is available here.

Beware of dilution of DynamoDB throughput due to excessive scaling

TL;DR – The no. of partitions in a DynamoDB table goes up in response to increased load or storage size, but it never come back down, ever.

DynamoDB is pretty great, but as I have seen this particular problem at 3 different companies – Gamesys, JUST EAT, and now Space Ape Games – I think it’s a behaviour that more folks should be aware of.

Credit to AWS, they have regularly talked about the formula for working out the no. of partitions at DynamoDB Deep Dive sessions.

However, they often forget to mention that the DynamoDB will not decrease the no. of partitions when you reduce your throughput units. It’s a crucial detail that is badly under-represented in a lengthy Best Practice guide.

Consider the following scenario:

  • you dial up the throughput for a table because there’s a sudden spike in traffic or you need the extra throughput to run an expensive scan
  • the extra throughputs cause DynamoDB to increase the no. of partitions
  • you dial down the throughput to previous levels, but now you notice that some requests are throttled even when you have not exceeded the provisioned throughput on the table

This happens because there are less read and write throughput units per partition than before due to the increased no. of partitions. It translates to higher likelihood of exceeding read/write throughput on a per-partition basis (even if you’re still under the throughput limits on the table overall).

When this dilution of throughput happens you can:

  1. migrate to a new table
  2. specify higher table-level throughput to boost the through units per partition to previous levels

Given the difficulty of table migrations most folks would opt for option 2, which is how JUST EAT ended up with a table with 3000+ write throughput unit despite consuming closer to 200 write units/s.

In conclusion, you should think very carefully before scaling up a DynamoDB table drastically in response to temporary needs, it can have long lasting cost implications.

AWS Lambda – build yourself a URL shortener in 2 hours

An interesting requirement came up at work this week where we discussed potentially having to run our own URL Shortener because the Universal Links mechanism (in iOS 9 and above) requires a JSON manifest at

https://domain.com/apple-app-site-association

Since the OS doesn’t follow redirects this manifest has to be hosted on the URL shortener’s root domain.

Owing to a limitation on AppsFlyer it’s currently not able to shorten links when you have Universal Links configured for your app. Whilst we can switch to another vendor it means more work for our (already stretched) client devs and we really like AppsFlyer‘s support for attributions.

Which brings us back to the question

“should we build a URL shortener?”

swiftly followed by

“how hard can it be to build a scalable URL shortener in 2017?”

Well, turns out it wasn’t hard at all 

Lambda FTW

For this URL shortener we’ll need several things:

  1. a GET /{shortUrl} endpoint that will redirect you to the original URL
  2. a POST / endpoint that will accept an original URL and return the shortened URL
  3. an index.html page where someone can easily create short URLs
  4. a GET /apple-app-site-association endpoint that serves a static JSON response

all of which can be accomplished with API Gateway + Lambda.

Overall, this is the project structure I ended up with:

  • using the Serverless framework’s aws-nodejs template
  • each of the above endpoint have a corresponding handler function
  • the index.html file is in the static folder
  • the test cases are written in such a way that they can be used both as integration as well as acceptance tests
  • there’s a build.sh script which facilitates running
    • integration tests, eg ./build.sh int-test {env} {region} {aws_profile}
    • acceptance tests, eg ./build.sh acceptance-test {env} {region} {aws_profile}
    • deployment, eg ./build.sh deploy {env} {region} {aws_profile}

Get /apple-app-site-association endpoint

Seeing as this is a static JSON blob, it makes sense to precompute the HTTP response and return it every time.

POST / endpoint

For an algorithm to shorten URLs, you can find a very simple and elegant solution on StackOverflow. All you need is an auto-incremented ID, like the ones you normally get with RDBMS.

However, I find DynamoDB a more appropriate DB choice here because:

  • it’s a managed service, so no infrastructure for me to worry about
  • OPEX over CAPEX, man!
  • I can scale reads & writes throughput elastically to match utilization level and handle any spikes in traffic

but, DynamoDB has no such concept as an auto-incremented ID which the algorithm needs. Instead, you can use an atomic counter to simulate an auto-incremented ID (at the expense of an extra write-unit per request).

GET /{shortUrl} endpoint

Once we have the mapping in a DynamoDB table, the redirect endpoint is a simple matter of fetching the original URL and returning it as part of the Location header.

Oh, and don’t forget to return the appropriate HTTP status code, in this case a 308 Permanent Redirect.

GET / index page

Finally, for the index page, we’ll need to return some HTML instead (and a different content-type to go with the HTML).

I decided to put the HTML file in a static folder, which is loaded and cached the first time the function is invoked.

Getting ready for production

Fortunately I have had plenty of practice getting Lambda functions to production readiness, and for this URL shortener we will need to:

  • configure auto-scaling parameters for the DynamoDB table (which we have an internal system for managing the auto-scaling side of things)
  • turn on caching in API Gateway for the production stage

Future Improvements

If you put in the same URL multiple times you’ll get back different short-urls, one optimization (for storage and caching) would be to return the same short-url instead.

To accomplish this, you can:

  1. add GSI to the DynamoDB table on the longUrl attribute to support efficient reverse lookup
  2. in the shortenUrl function, perform a GET with the GSI to find existing short url(s)

I think it’s better to add a GSI than to create a new table here because it avoids having “transactions” that span across multiple tables.

Useful Links

Slides and recording of my Lambda talk at LeetSpeak 2016