Performance Test — SortedDictionary vs Dictionary vs Map vs Array

I stum­bled upon this inter­est­ing ques­tion on Stack­Over­flow today, Jon Harrop’s answer men­tions a sig­nif­i­cant over­head in adding and iter­at­ing over a Sorted­Dic­tionary and Map com­pared to using sim­ple arrays.

Think­ing about it, this makes sense, the Sorted­Dic­tionary class sorts its con­stituent key-val­ue pairs by key, which will nat­u­ral­ly incur some per­for­mance over­head.

F#‘s Map con­struct on the oth­er hand, is immutable, and adding an item to a Map returns the result­ing Map – a  new instance of Map which includes all the items from the orig­i­nal Map instance plus the new­ly added item. As you can imag­ine, this means copy­ing over a lot of data when you’re work­ing with a large map which is an obvi­ous per­for­mance hit.

This is a sim­i­lar prob­lem to using List.append ( or equiv­a­lent­ly using the @ oper­a­tor ) on lists as it also involves copy­ing the data in the first list, more on that on anoth­er post.

Any­how, the ques­tion piqued my inter­est and I had to test it out and get some quan­ti­ta­tive num­bers for myself, and I was also inter­est­ed in see­ing how the stan­dard Dic­tio­nary class does com­pared to the rest. :-)

The test code is very sim­ple, feel free to take a look here and let me know if them are unfair in any way. In short, the test was to add 1,000,000 items and then iter­ate over them with each type of con­struct and record the time each step took.

The results are below, the times are record­ed in sec­onds, aver­aged over 5 runs.

image

Aside from the fact that the Map con­struct did par­tic­u­lar­ly poor­ly in these tests, it was inter­est­ing to see that ini­tial­iz­ing a Dic­tio­nary instance with suf­fi­cient capac­i­ty to begin with allowed it to per­form twice as fast!

To under­stand where that per­for­mance boost came from, you need to under­stand that a Dic­tio­nary uses an inter­nal array of entry objects (see below) to keep track of what’s in the dic­tio­nary:

image

When that inter­nal array fills up, it replaces the array with a big­ger array and the size of the new array is, rough­ly speak­ing, the small­est prime num­ber that’s >= cur­rent capac­i­ty times 2, even though the imple­men­ta­tion only uses a cached array of 72 prime num­bers 3, 7, 11, 17, 23, 29, 37, 47, … 7199369.

So when I ini­tial­ized a Dic­tio­nary with­out spec­i­fy­ing its capac­i­ty (hence capac­i­ty = 0) and pro­ceed to add 1 mil­lion items it will have had to resize its inter­nal array 18 times, caus­ing more over­head with each resize.

Closing thoughts…

Again, these results should be tak­en at face val­ue only, it doesn’t mean that you should nev­er use Map because it’s slow­er than the oth­er struc­tures for addi­tions and iter­a­tions, or that you should start replac­ing your dic­tio­nar­ies with arrays…

Instead, use the right tool for the right job.

If you’ve got a set of sta­t­ic data (such as con­fig­u­ra­tion data that’s loaded when your appli­ca­tion starts up) you need to look up by key fre­quent­ly, a Map is as good a choice as any, its immutabil­i­ty in this case ensures that the sta­t­ic data can­not be mod­i­fied by mis­take and has lit­tle impact to per­for­mance as you nev­er need to mutate it once ini­tial­ized.