Performance Test — Prime numbers with LINQ vs PLINQ vs F#

Hav­ing spent quite a bit of time cod­ing in F# recent­ly I have thor­ough­ly enjoyed the expe­ri­ence of cod­ing in a func­tion­al style and come to real­ly like the fact you can do so much with so lit­tle code.

One of the counter-claims against F# has always been the con­cerns over per­for­mance in the most per­for­mance crit­i­cal appli­ca­tions, and with that in mind I decid­ed to do a lit­tle exper­i­ment of my own using C# (LINQ & PLINQ) and F# to gen­er­ate all the prime num­bers under a giv­en max val­ue.

The LINQ and PLINQ meth­ods in C# look some­thing like this:

private static void DoCalcSequentially(int max)
{
    var numbers = Enumerable.Range(3, max-3);
    var query =
        numbers
            .Where(n => Enumerable.Range(2, (int)Math.Sqrt(n))
            .All(i => n % i != 0));
    query.ToArray();
}

private static void DoCalcInParallel(int max)
{
    var numbers = Enumerable.Range(3, max-3);
    var parallelQuery =
        numbers
            .AsParallel()
            .Where(n => Enumerable.Range(2, (int)Math.Sqrt(n))
            .All(i => n % i != 0));
    parallelQuery.ToArray();
}

The F# ver­sion on the oth­er hand uses the fair­ly opti­mized algo­rithm I had been using in most of my project euler solu­tions:

let mutable primeNumbers = [2]

// generate all prime numbers under <= this max
let getPrimes max =
    // only check the prime numbers which are <= the square root of the number n
    let hasDivisor n =
        primeNumbers
        |> Seq.takeWhile (fun n' -> n' <= int(sqrt(double(n))))
        |> Seq.exists (fun n' -> n % n' = 0)

    // only check odd numbers <= max
    let potentialPrimes = Seq.unfold (fun n -> if n > max then None else Some(n, n+2)) 3
    // populate the prime numbers list
    for n in potentialPrimes do if not(hasDivisor n) then primeNumbers <- primeNumbers @ [n]

    primeNumbers

Here’s the aver­age exe­cu­tion time in mil­lisec­onds for each of these meth­ods over 3 runs for max = 1000, 10000, 100000, 1000000:

image

Have to admit this doesn’t make for a very com­fort­able reading…on aver­age the F# ver­sion, despite being opti­mized, runs over 3 – 6 times as long as the stan­dard LINQ ver­sion! The PLINQ ver­sion on the oth­er hand, is slow­er in com­par­i­son to the stan­dard LINQ ver­sion when the set of data is small as the over­head of par­ti­tion­ing, col­lat­ing and coor­di­nat­ing the extra threads actu­al­ly slows things down, but on a larg­er dataset the ben­e­fit of par­al­lel pro­cess­ing starts to shine through.

UPDATE 13/11/2010:

Thanks for Jaen’s com­ment, the cause for the F# ver­sion of the code to be much slow­er is because of this line:

primeNumbers <- primeNumbers @ [n]

because a new list is con­struct­ed every time and all ele­ments from the pre­vi­ous list copied over.

Unfor­tu­nate­ly, there’s no way to add an ele­ment to an exist­ing List or Array in F# with­out get­ting a new list back (at least I don’t know of a way to do this), so to get around this per­for­mance hand­i­cap the eas­i­est way is to make the prime num­bers list a gener­ic List instead (yup, luck­i­ly you are free to use CLR types in F#):

open System.Collections.Generic

// initialize the prime numbers list with 2
let mutable primeNumbers = new List<int>()
primeNumbers.Add(2)

// as before
...

    // populate the prime numbers list
    for n in potentialPrimes do if not(hasDivisor n) then primeNumbers.Add(n)

    primeNumbers

With this change, the per­for­mance of the F# code is now com­pa­ra­ble to that of the stan­dard LINQ ver­sion.