This is something I’ve mentioned in my recent AOP talks, and I think it’s worthy of a wider audience as it can be very useful to anyone who’s obsessed with performance as I am.
At iwi, we take performance very seriously and are always looking to improve the performance of our applications. In order for us to identify the problem areas and focus our efforts on the big wins we first need a way to measure and monitor the individual performance of the different components inside our system, sometimes down to a method level.
Fortunately, with the help of AOP and AWS CloudWatch we’re able to get a pseudo-realtime view on how frequently a method is executed and how much time it takes to execute, down to one minute intervals:
With this information, I can quickly identify methods that are the worst offenders and focus my profiling and optimization efforts around those particular methods/components.
Whilst I cannot disclose any implementation details in this post, it is my hope that it’ll be sufficient to give you an idea of how you might be able to implement a similar mechanism.
A while back I posted about a simple attribute for watching method executing time and logging warning messages when a method takes longer than some pre-defined threshold.
Now, it’s possible and indeed easy to modify this simple attribute to instead keep track of the execution times and bundle them up into average/min/max values for a given minute. You can then publish these minute-by-minute metrics to AWS CloudWatch from each virtual instance and let the CloudWatch service itself handle the task of aggregating all the data-points.
By encapsulating the logic of measuring execution time into an attribute, you can start measuring a particular method by simply applying the attribute to that method. Alternatively, PostSharp supports pointcut and lets you multicast an attribute to many methods at once, and allows you to filter the method target by name as well as visibility level. It is therefore possible for you to start measuring and publishing the execution time of ALL public methods in a class/assembly with only one line of code!
The CloudWatch service should be familiar to anyone who has used AWS EC2 before, it’s a monitoring service primarily for AWS cloud resources (virtual instances, load balancers, etc.) but it also allows you to publish your own data about your application. Even if your application is not being hosted inside AWS EC2, you can still make use of the CloudWatch service as long as you have an AWS account and a valid AWS access key and secret.
Once published, you can visualize your data inside the AWS web console, depending on the type of data you’re publishing there are a number of different ways you can view them – Average, Min, Max, Sum, Count, etc.
Note that AWS only keeps up to two weeks worth of data, so if you want to keep the data for longer you’ll have to query and store the data yourself. For instance, it makes sense to keep a history of hourly averages for the method execution times you’re tracking so that in the future, you can easily see where and when a particular change has impacted the performance of those methods. After all, storage is cheap and even with thousands of data points you’ll only be storing that many rows per hour.
I’m an AWS Serverless Hero and the author of Production-Ready Serverless. I have run production workload at scale in AWS for nearly 10 years and I have been an architect or principal engineer with a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming. I currently work as an independent consultant focused on AWS and serverless.
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are thinking about serverless costs all wrong
- Many faced threats to Serverless security
- We can do better than percentile latencies
- I’m afraid you’re thinking about AWS Lambda cold starts all wrong
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong