Yan Cui
I help clients go faster for less using serverless technologies.
I have heard a few people argue that when it comes to performance critical code you should prefer arrays over other collections (such as F#’s lists) as it benefits from sequential reads (which is faster than seeks) and offers better memory locality.
To test that theory somewhat, I wanted to see if there is any difference in how fast you can iterate through an array versus a list in F#, and how much faster you can map over an array compared to a list:
The result is a little surprising, whilst I wasn’t expecting there to be a massive difference in the iterating through the two types of collections, I didn’t think mapping over a list would be quite as slow in comparison. I knew that constructing a list is much heavier than constructing an array, but I didn’t think it’d take 22x as long in this case.
What was even more surprising was how much slower the Seq.iter and Seq.map functions are compared to the Array and List module equivalents! This is, according to John Palmer:
Once you call in to
Seq
you lose the type information – moving to the next element in the list requires a call toIEnumerator.MoveNext
. Compare to forArray
you just increment an index and forList
you can just dereference a pointer. Essentially, you are getting an extra function call for each element in the list.The conversions back to
List
andArray
also slow the code down for similar reasons
Update 2012/06/04:
As a work around, you COULD shadow the Seq module with iter and map functions that adds simple type checking and in the case of an array or list simply call the corresponding function in the Array or List module instead:
Whilst this approach will work to a certain extend, you should be careful with which functions you shadow. For instance, it’s not safe to shadow Seq.map because it can be used in conjunction with other functions such as Seq.takeWhile or Seq.take. In the base implementation, a line of code such as:
arr |> Seq.map incr |> Seq.take 3
will not map over every element in the source array.
With the shadowed version (see above) of Seq.map, however, this would first create a new array by applying the mapper function against every element in the source array before discarding all but the first three elements in the new array. This, as you can imagine, is far less efficient and requires much more memory space (for the new array) and defeats the purpose of using Seq module functions in most cases.
Whenever you’re ready, here are 3 ways I can help you:
- Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
- I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
- Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.
Hi,
I don’t think you’re being fair on the Seq in combination with the List. Since the List is immutable, there’s no need to materialize the List in the last step “|> Seq.ToList”. You can just pass around the lazy Seq over the Immutable list and be sure that you will always get the same result.
The Array does not have that luxury, but this may not have to be an issue in most cases if there’s no concurrency involved.
@Gert-Jan – without the last Seq.toList step you will get a sequence back as opposed to a list, which is not the same as ‘lst |> List.map incr’ which returns a new list where each element is mapped from the source list.
The test was purely intended to show if there’s any difference in how fast simple iter and map operations are performed on array and lists, and when using the Seq.iter and Seq.map functions.
Nice tests! But your conclusion is misleading.
Your loop size is huge: 10,000,000! So the actual overhead for a single iteration is 10 million times less.
If you take a reasonable loop size, say 1000, then your timings are 10,000 times less.
So, your tests actually show, all things considered, the relative overhead for Seq, Loop, and Array are negligible.
@ray – it’s more about relative cost, actual impact depends upon usage in your application of course. The difference is negligible if you don’t have to deal with large collections or many many small collections on a regular basis.