I stumbled upon this interesting question on StackOverflow today, Jon Harrop’s answer mentions a significant overhead in adding and iterating over a SortedDictionary and Map compared to using simple arrays.
Thinking about it, this makes sense, the SortedDictionary class sorts its constituent key-value pairs by key, which will naturally incur some performance overhead.
F#‘s Map construct on the other hand, is immutable, and adding an item to a Map returns the resulting Map – a new instance of Map which includes all the items from the original Map instance plus the newly added item. As you can imagine, this means copying over a lot of data when you’re working with a large map which is an obvious performance hit.
This is a similar problem to using List.append ( or equivalently using the @ operator ) on lists as it also involves copying the data in the first list, more on that on another post.
Anyhow, the question piqued my interest and I had to test it out and get some quantitative numbers for myself, and I was also interested in seeing how the standard Dictionary class does compared to the rest. :-)
The test code is very simple, feel free to take a look here and let me know if them are unfair in any way. In short, the test was to add 1,000,000 items and then iterate over them with each type of construct and record the time each step took.
The results are below, the times are recorded in seconds, averaged over 5 runs.
Aside from the fact that the Map construct did particularly poorly in these tests, it was interesting to see that initializing a Dictionary instance with sufficient capacity to begin with allowed it to perform twice as fast!
To understand where that performance boost came from, you need to understand that a Dictionary uses an internal array of entry objects (see below) to keep track of what’s in the dictionary:
When that internal array fills up, it replaces the array with a bigger array and the size of the new array is, roughly speaking, the smallest prime number that’s >= current capacity times 2, even though the implementation only uses a cached array of 72 prime numbers 3, 7, 11, 17, 23, 29, 37, 47, … 7199369.
So when I initialized a Dictionary without specifying its capacity (hence capacity = 0) and proceed to add 1 million items it will have had to resize its internal array 18 times, causing more overhead with each resize.
Again, these results should be taken at face value only, it doesn’t mean that you should never use Map because it’s slower than the other structures for additions and iterations, or that you should start replacing your dictionaries with arrays…
Instead, use the right tool for the right job.
If you’ve got a set of static data (such as configuration data that’s loaded when your application starts up) you need to look up by key frequently, a Map is as good a choice as any, its immutability in this case ensures that the static data cannot be modified by mistake and has little impact to performance as you never need to mutate it once initialized.