Yubl’s road to Serverless architecture – Part 1

Note: see here for the rest of the series.

 

Since Yubl’s closure quite a few people have asked about the serverless architecture we ended up with and some of the things we have learnt along the way.

As such, this is the first of a series of posts where I’d share some of the lessons we learnt. However, bear in mind the pace of change in this particular space so some of the challenges/problems we encountered might have been solved by the time you read this.

ps. many aspects of this series is already covered in a talk I gave on Amazon Lambda at Leetspeak this year, you can find the slides and recording of the talk here.

 

From A Monolithic Beginning

Back when I joined Yubl in April I inherited a monolithic Node.js backend running on EC2 instances, with MongoLab (hosted MongoDB) and CloudAMQP (hosted RabbitMQ) thrown into the mix.

yubl-monolith

There were numerous problems with the legacy system, some could be rectified with incremental changes (eg. blue-green deployment) but others required a rethink at an architectural level. Although things look really simple on paper (at the architecture diagram level), all the complexities are hidden inside each of these 3 services and boy, there were complexities!

My first tasks were to work with the ops team to improve the existing deployment pipeline and to draw up a list of characteristics we’d want from our architecture:

  • able to do small, incremental deployments
  • deployments should be fast, and requires no downtime
  • no lock-step deployments
  • features can be deployed independently
  • features are loosely coupled through messages
  • minimise cost for unused resources
  • minimise ops effort

From here we decided on a service-oriented architecture, and Amazon Lambda seemed the perfect tool for the job given the workloads we had:

  • lots of APIs, all HTTPS, no ultra-low latency requirement
  • lots of background tasks, many of which has soft-realtime requirement (eg. distributing post to follower’s timeline)

 

To a Serverless End

It’s suffice to say that we knew the migration was going to be a long road with many challenges along the way, and we wanted to do it incrementally and gradually increase the speed of delivery as we go.

“The lead time to someone saying thank you is the only reputation metric that matters”

– Dan North

The first step of the migration was to make the legacy systems publish state changes in the system (eg. user joined, user A followed user B, etc.) so that we can start building new features on top of the legacy systems.

To do this, we updated the legacy systems to publish events to Kinesis streams.

 

Our general strategy is:

  • build new features on top of these events, which usually have their own data stores (eg. DynamoDB, CloudSearch, S3, BigQuery, etc.) together with background processing pipelines and APIs
  • extract existing features/concepts from the legacy system into services that will run side-by-side
    • these new services will initially be backed by the same shared MongoLab database
    • other services (including the legacy ones) are updated to use hand-crafted API clients to access the encapsulated resources via the new APIs rather than hitting the shared MongoLab database directly
    • once all access to these resources are done via the new APIs, data migration (usually to DynamoDB tables) will commence behind the scenes
  • wherever possible, requests to existing API endpoints are forwarded to the new APIs so that we don’t have to wait for the iOS and Android apps to be updated (which can take weeks) and can start reaping the benefits earlier

 

After 6 months of hard work, my team of 6 backend engineers (including myself) have drastically transformed our backend infrastructure. Amazon was very impressed by the work we were doing with Lambda and in the process of writing up a case study of our work when Yubl was shut down at the whim of our major shareholder.

Here’s an almost complete picture of the architecture we ended up with (some details are omitted for brevity and clarity).

overall

Some interesting stats:

  • 170 Lambda functions running in production
  • roughly 1GB of total deployment package size (after Janitor Lambda cleans up unreferenced versions)
  • Lambda cost was around 5% of what we pay for EC2 for a comparable amount of compute
  • the no. of production deployments increased from 9/month in April to 155 in September

 

For the rest of the series I’ll drill down into specific features, how we utilised various AWS services, and how we tackled the challenges of:

  • centralised logging
  • centralised configuration management
  • distributed tracing with correlation IDs for Lambda functions
  • keeping Lambda functions warm to avoid coldstart penalty
  • auto-scaling AWS resources that do not scale dynamically
  • automatically clean up old Lambda function versions
  • securing sensitive data (eg. mongodb connection string, service credentials, etc.)

I can also explain our strategy for testing, and running/debugging functions locally, and so on. If there’s anything you’d like me to cover in particular, please leave a comment and let me know.

 

Links

Slides and recording of my Lambda talk at LeetSpeak 2016

New releases – DynamoDB.SQL and Darkseid

Hi, just a quick update on two of my libraries aimed at making AWS easier to work with from .Net.

 

DynamoDB.SQL

DynamoDB.SQL is a SQL-like external DSL for querying & scanning data in Amazon DynamoDB. Version 3.0.0 has been released, which moves away from the monolithic .Net AWSSDK (v2.x.x), and onto the DynamoDB specific package.

You can continue to use v2.x packages for DynamoDB.SQL, I’ll apply any bug fixes to both v2.x and v3.x packages. However, any new features in the future – such as support for the mid-level Table abstractions in the AWSSDK – will be added to v3.x only.

Over the coming weeks and months, I’ll continue the effort to migrate my fleet of AWS-related tools and libraries to the service-specific packages.

 

Darkseid

Darkseid is a producer library for Amazon Kinesis, it works hand-in-hand with ReactoKinesix which provides the consumer side of the story.

Courtesy of Dustin’s PR, version 0.3.0 has been released which adds synchronous methods for pushing events to Kinesis (so that services that aren’t ready to go async all the way can still integrate with Kinesis using Darkseid).

 

That’s it, folks, hope you all had a nice weekend. If you haven’t seen Deadpool yet, you should, it is amazing 

Introducing log4net.Kinesis, a log4net appender for Amazon Kinesis

Just under three weeks ago, Amazon announced the public availability of their new Kinesis service, a service which is designed to allow real-time processing of streaming big data.

As an experiment I have put together a simple, actor-based customer appender for log4net which allows you to publish your log messages into a configured Kinesis stream. You can then have another cluster of machines to fetch the data from the stream and do whatever processing or aggregation you like to do.

You can download and install the appender from Nuget here or checkout the source code here.

The implementation is done in F# in 100 lines of code, and as you can see is very simple, easy to reason with, fully asynchronous and thread-safe.

 

Once you have pushed your log messages into the stream, you’ll need to use the AWSSDK to fetch the data and process them. For Java, there’s a client application which takes care of most of the heavy lifting – e.g. tracking your progress, handling failovers and load balancing. Unfortunately, at the time of writing, there’s no equivalent of such client application in the current version of the .Net AWSSDK.

So to help make it easier for us .Net folks to build real-time data processing applications on top of Amazon Kinesis, I had started a Rx-based .Net client library called ReactoKinesiX (I really wanted to get RX into the name!), more details to follow.

 

I think the introduction of Kinesis is very exciting and opens up many possibilities, and at the current pricing model it also represents a very cost effective alternative to some of the other competing and more polished services out there.