You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!
part 1: overview <- you’re here
part 2: testing and CI/CD
part 3: ops
Since Yubl’s closure quite a few people have asked about the serverless architecture we ended up with and some of the things we have learnt along the way.
As such, this is the first of a series of posts where I’d share some of the lessons we learnt. However, bear in mind the pace of change in this particular space so some of the challenges/problems we encountered might have been solved by the time you read this.
ps. many aspects of this series is already covered in a talk I gave on Amazon Lambda at Leetspeak this year, you can find the slides and recording of the talk here.
From A Monolithic Beginning
Back when I joined Yubl in April I inherited a monolithic Node.js backend running on EC2 instances, with MongoLab (hosted MongoDB) and CloudAMQP (hosted RabbitMQ) thrown into the mix.
There were numerous problems with the legacy system, some could be rectified with incremental changes (eg. blue-green deployment) but others required a rethink at an architectural level. Although things look really simple on paper (at the architecture diagram level), all the complexities are hidden inside each of these 3 services and boy, there were complexities!
My first tasks were to work with the ops team to improve the existing deployment pipeline and to draw up a list of characteristics we’d want from our architecture:
- able to do small, incremental deployments
- deployments should be fast, and requires no downtime
- no lock-step deployments
- features can be deployed independently
- features are loosely coupled through messages
- minimise cost for unused resources
- minimise ops effort
From here we decided on a service-oriented architecture, and Amazon Lambda seemed the perfect tool for the job given the workloads we had:
- lots of APIs, all HTTPS, no ultra-low latency requirement
- lots of background tasks, many of which has soft-realtime requirement (eg. distributing post to follower’s timeline)
To a Serverless End
It’s suffice to say that we knew the migration was going to be a long road with many challenges along the way, and we wanted to do it incrementally and gradually increase the speed of delivery as we go.
“The lead time to someone saying thank you is the only reputation metric that matters”
– Dan North
The first step of the migration was to make the legacy systems publish state changes in the system (eg. user joined, user A followed user B, etc.) so that we can start building new features on top of the legacy systems.
To do this, we updated the legacy systems to publish events to Kinesis streams.
Our general strategy is:
- build new features on top of these events, which usually have their own data stores (eg. DynamoDB, CloudSearch, S3, BigQuery, etc.) together with background processing pipelines and APIs
- extract existing features/concepts from the legacy system into services that will run side-by-side
- these new services will initially be backed by the same shared MongoLab database
- other services (including the legacy ones) are updated to use hand-crafted API clients to access the encapsulated resources via the new APIs rather than hitting the shared MongoLab database directly
- once all access to these resources are done via the new APIs, data migration (usually to DynamoDB tables) will commence behind the scenes
- wherever possible, requests to existing API endpoints are forwarded to the new APIs so that we don’t have to wait for the iOS and Android apps to be updated (which can take weeks) and can start reaping the benefits earlier
After 6 months of hard work, my team of 6 backend engineers (including myself) have drastically transformed our backend infrastructure. Amazon was very impressed by the work we were doing with Lambda and in the process of writing up a case study of our work when Yubl was shut down at the whim of our major shareholder.
Here’s an almost complete picture of the architecture we ended up with (some details are omitted for brevity and clarity).
Some interesting stats:
- 170 Lambda functions running in production
- roughly 1GB of total deployment package size (after Janitor Lambda cleans up unreferenced versions)
- Lambda cost was around 5% of what we pay for EC2 for a comparable amount of compute
- the no. of production deployments increased from 9/month in April to 155 in September
For the rest of the series I’ll drill down into specific features, how we utilised various AWS services, and how we tackled the challenges of:
- centralised logging
- centralised configuration management
- distributed tracing with correlation IDs for Lambda functions
- keeping Lambda functions warm to avoid coldstart penalty
- auto-scaling AWS resources that do not scale dynamically
- automatically clean up old Lambda function versions
- securing sensitive data (eg. mongodb connection string, service credentials, etc.)
I can also explain our strategy for testing, and running/debugging functions locally, and so on. If there’s anything you’d like me to cover in particular, please leave a comment and let me know.
- Slides and Recording of my Amazon Lambda talk at Leetspeak
- Janitor-Lambda function to clean up old deployment packages
Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Skill up your serverless game with this hands-on workshop.
My 4-week Production-Ready Serverless online workshop is back!
This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.
We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.
If you enrol now you can also get 15% OFF with the promo code “yanprs15”.
Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!
Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- All you need to know about caching for serverless applications
- Choreography vs Orchestration in the land of serverless
- Are Lambda-to-Lambda calls really so bad?
- Lambda optimization tip – enable HTTP keep-alive
- You are wrong about serverless and vendor lock-in
- You are thinking about serverless costs all wrong
- Check-list for going live with API Gateway and Lambda
- How to choose the right API Gateway auth method
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- Guys, we’re doing pagination wrong
- Top 10 Serverless framework best practices
- I left full-time employment, here’s what happened since
- How to break the “senior engineer” career ceiling
- My advice to junior developers