Check out my new course Learn you some Lambda best practice for great good! and learn the best practices for performance, cost, security, resilience, observability and scalability.
I stumbled upon this interesting question on StackOverflow today, Jon Harrop’s answer mentions a significant overhead in adding and iterating over a SortedDictionary and Map compared to using simple arrays.
Thinking about it, this makes sense, the SortedDictionary class sorts its constituent key-value pairs by key, which will naturally incur some performance overhead.
F#’s Map construct on the other hand, is immutable, and adding an item to a Map returns the resulting Map – a new instance of Map which includes all the items from the original Map instance plus the newly added item. As you can imagine, this means copying over a lot of data when you’re working with a large map which is an obvious performance hit.
This is a similar problem to using List.append ( or equivalently using the @ operator ) on lists as it also involves copying the data in the first list, more on that on another post.
Anyhow, the question piqued my interest and I had to test it out and get some quantitative numbers for myself, and I was also interested in seeing how the standard Dictionary class does compared to the rest. :-)
The test code is very simple, feel free to take a look here and let me know if them are unfair in any way. In short, the test was to add 1,000,000 items and then iterate over them with each type of construct and record the time each step took.
The results are below, the times are recorded in seconds, averaged over 5 runs.
Aside from the fact that the Map construct did particularly poorly in these tests, it was interesting to see that initializing a Dictionary instance with sufficient capacity to begin with allowed it to perform twice as fast!
To understand where that performance boost came from, you need to understand that a Dictionary uses an internal array of entry objects (see below) to keep track of what’s in the dictionary:
When that internal array fills up, it replaces the array with a bigger array and the size of the new array is, roughly speaking, the smallest prime number that’s >= current capacity times 2, even though the implementation only uses a cached array of 72 prime numbers 3, 7, 11, 17, 23, 29, 37, 47, … 7199369.
So when I initialized a Dictionary without specifying its capacity (hence capacity = 0) and proceed to add 1 million items it will have had to resize its internal array 18 times, causing more overhead with each resize.
Again, these results should be taken at face value only, it doesn’t mean that you should never use Map because it’s slower than the other structures for additions and iterations, or that you should start replacing your dictionaries with arrays…
Instead, use the right tool for the right job.
If you’ve got a set of static data (such as configuration data that’s loaded when your application starts up) you need to look up by key frequently, a Map is as good a choice as any, its immutability in this case ensures that the static data cannot be modified by mistake and has little impact to performance as you never need to mutate it once initialized.
I specialise in rapidly transitioning teams to serverless and building production-ready services on AWS.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!
Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!
Are you working with Serverless and looking for expert training to level-up your skills? Or are you looking for a solid foundation to start from? Look no further, register for my Production-Ready Serverless workshop to learn how to build production-grade Serverless applications!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are wrong about serverless and vendor lock-in
- You are thinking about serverless costs all wrong
- Just how expensive is the full AWS SDK?
- Many faced threats to Serverless security
- We can do better than percentile latencies
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong
- Top 10 Serverless framework best practices