You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!
There were a couple of micro-services related talks at this year’s edition of CraftConf. The first was by Adrian Trenaman of Gilt, who talked about their journey from a monolithic architecture to micro-services, and from self-managed datacentres to the cloud.
From Monolith to Micro-Services
They started off with a Ruby on Rails monolithic architecture in 2007, but quickly grew to a $400M business within 4 years.
With that success came the scalability challenge, one that they couldn’t meet with their existing architecture.
It was at this point that they started to split up their monolith.
A number of smaller, but still somewhat monolithic services were created – product service, user service, etc. But most of the services were still sharing the same Postgres database.
The cart service was the only one that had its own dedicated datastore, and this proved to be a valuable lesson to them. In order to evolve services independently you need to severe the hidden coupling that occurs when services share the same database.
Adrian also pointed out that, whilst some of the backend services have been split out, the UI applications (JSP pages) were still monoliths. Lots business logic such as promos and offers were hardcoded and hard/slow to change.
Another pervasive issue is that services are all loosely typed – everything’s a map. This introduced plenty of confusion around what’s in the map. This was because lots of their developers still hadn’t made the mental transition to working in a statically typed language.
Fast forward to 2015, and they’re now creating micro-services at a much faster rate using primarily Scala and the Play framework.
They have left the original services alone as they’re well understood and haven’t had to change frequently.
The front end application has been broken up into a set of granular, page-specific applications.
There has also been significant cultural changes:
- emergent architecture design rather than top-down
- technology decisions are driven by KPI where appropriate; or simple goals where there is no measurable KPIs
- fail fast, and be open about it so that lessons can be shared amongst the organization; they even hold regular meetings to allow people to openly discuss their failures
To the Cloud
Alongside the transition to micro-services, Gilt also migrated to AWS using a hybrid approach via Amazon VPC.
Every team has its own AWS account as well as a budget. However, organization of teams can change from time to time, but with this setup it’s difficult to move services around different AWS accounts.
One thing to note about Gilt’s migration to micro-services is that it took a long time and they have taken an incremental approach.
Adrian explained this as down to them prioritizing getting into market and extracting real business values over technical evolution.
- it lessens inter-dependencies between teams
- faster code-to-production
- allows lots of initiatives in parallel
- allows different language/framework to be used by each team
- allows graceful degradation of services
- allows code to be easily disposable – easy to innovate, fail and move on; Greg Young also touched on this in his recent talk on optimizing for deletability
Challenges with Micro-Services
Adrian listed 7 challenges his team came across in their journey.
They found it hard to maintain staging environments across multiple teams and services. Instead, they have come to believe that testing in production is the best way to go, and it’s therefore necessary to invest in automation tools to aid with doing canary releases.
I once heard Ben Christensen of Netflix talk about the same thing, that Netflix too has come to realize that the only way to test a complex micro-services architecture is to test it in production.
That said, I’m sure both Netflix and Gilt still have basic tests to catch the obvious bugs before they release anything into the wild. But these tests would not sufficiently test the interaction between the services (something Dan North and Jessica Kerr covered in their opening keynote Complexity is Outside the Code).
To reduce the risk involved with testing in production, you should at least have:
- canary release mechanism to limit impact of bad release and;
- minimize time for roll back;
- minimize time to discovery for problems by having granular metrics (see ex-Netflix architect Adrian Cockcroft’s talk at Monitorama 14 on this)
Who owns the service, and what happens if the person who created the service moves onto something else?
Gilt’s decision is to have teams and departments own the service, rather than individual staff.
Gilt is building tools over Docker to give them elastic and fast provisioning. This is kinda interesting, because there are already a number of such tools/services available, such as:
It’s not clear what are missing from these that is driving Gilt to build their own tools.
They also make sure that the clients are totally decoupled from the service, and are dumb and have zero dependencies.
Audit & Alerting
In order to give your engineers full autonomy in production and still stay compliant, you need good auditing and alerting capabilities.
Gilt built a smart alerting tool called CAVE, which alerts you when system invariants – e.g. number of orders shipped to the US in 5 min interval should be greater than 0 – have been violated.
Monitoring of micro-services is an interesting topic because once again, the traditional approach to monitoring – alert me when CPU/network or latencies go above threshold – is no longer sufficient because of the complex effects of causality that runs through your inter-dependent services.
Instead, as Richard Rodger pointed out in his Measuring Micro-Services talk at Codemotion Rome this year, it’s far more effective to identify and monitor emerging properties instead.
I didn’t get a clear sense of how it works, but Adrian mentioned that Gilt has looked to lambda architecture for their critical path code, and are doing pre-computation and real-time push updates to reduce the number of IO calls that occur between services.
Since they have many databases, it means their data analysts would need access to all the databases in order to run their reports. To simplify things, they put their analytics data into a queue which is then written to S3.
Whilst Adrian didn’t divulge on what technology Gilt is using for analytics, there are quite a number of tools/services available to you on the market.
Based on our experience at Gamesys Social where we have been using Google BigQuery for a number of years, I strongly recommend that you take it into consideration. It’s a managed service that allows you to run SQL-like, ad-hoc queries on Exabyte-size datasets and get results back in seconds.
At this point we have around 100TB of analytics data at Gamesys Social and our data analysts are able to run their queries everyday and analyse 10s of TBs of data without any help from the developers.
BigQuery is just one of many interesting technologies that we use at Gamesys Social, so if you’re interested in working with us, we’re hiring for a functional programmer right now.
- CraftConf 15 takeaways
- Greg Young – the art of destroying software
- Takeaways from Adam Tornhill – Code as Crime Scene at QCON London
- Canary Release
- Adrian Cockcroft @ Monitorama PDX 2014
- A false choice : Microservices or Monoliths
- Microservices – not a free lunch!
- Richard Rodger – Measuring micro-services
Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Skill up your serverless game with this hands-on workshop.
My 4-week Production-Ready Serverless online workshop is back!
This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.
We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.
If you enrol now you can also get 15% OFF with the promo code “yanprs15”.
Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!
Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- All you need to know about caching for serverless applications
- Lambda optimization tip – enable HTTP keep-alive
- You are wrong about serverless and vendor lock-in
- You are thinking about serverless costs all wrong
- Just how expensive is the full AWS SDK?
- Check-list for going live with API Gateway and Lambda
- How to choose the right API Gateway auth method
- CloudFormation protip: use !Sub instead of !Join
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- Guys, we’re doing pagination wrong
- Top 10 Serverless framework best practices
- How to break the “senior engineer” career ceiling
- My advice to junior developers