Performance Test – Delegates vs Methods

Quite some time ago I was asked to cover a C# developer interview for one of my neighbouring teams, and on the sheet of questions the original interviewer wanted me to ask was this question:

Q. Why should you NEVER use delegates?

I thought: “Well, at least I can rule myself out for the role!”

To my mind, if using delegates complicates your code or make it less readable/maintainable then you shouldn’t use it, but to say NEVER to using delegates? It just doesn’t make sense…

Puzzled, I asked: “Why not?”

“Because it’s an order of magnitude slower than using methods”

Sounds like he’s had his fingers burnt in the past, but still, it goes against everything I have experienced with using delegates..

“At least it is in the case of .Net 1.1”

Ahh, a little more context makes it that much more believable! But mind you, this was 2009 and .Net 3.5 had been out and surely whatever performance issue with delegates would have been fixed long along…

The Test

So, some two years later, I decided to put together a quick test to see if there is any difference in performance between invoking a delegate and a method in C# 4. The test is simple, given an empty delegate and method (see below), invoking each 10000 times in a row, which would take longer?

   1: Action MyDelegate = () => {};

   2:

   3: void MyMethod() {}

It’s worth noting that the test code is run in a debug, non-optimized (so no compiler in-lining) build.

The Result

Somewhat surprisingly, invoking delegates proved to be faster, averaging 269 ticks over 5 runs, where as invoking methods took an average of 365 ticks!

No covariance for value type

For a while now I’ve been wondering why C#’s support for covariance does not cover value types, both in normal array covariance and covariance in the generic parameter introduced in C# 4:

   1: void Main()

   2: {

   3:     int i = 0;

   4:     string str = "hello world";

   5:     

   6:     TestMethod(i);       // legal

   7:     TestMethod(str);     // legal

   8:     TestMethod2(Enumerable.Empty<int>());           // illegal

   9:     TestMethod2(Enumerable.Empty<string>());        // legal

  10:     

  11:     Console.WriteLine(i is object);                 // true

  12:     Console.WriteLine(new int[0] is object[]);      // false

  13:     Console.WriteLine(new string[0] is object[]);   // true

  14:     Console.WriteLine(new uint[0] is int[]);        // false

  15: }

  16:  

  17: public void TestMethod(object obj)

  18: {

  19:     Console.WriteLine(obj);

  20: }

  21:  

  22: public void TestMethod2(IEnumerable<object> objs)

  23: {

  24:     Console.WriteLine(objs.Count());

  25: }

Until I stumbled upon this old post by Eric Lippert on the topic of array covariance, which essentially points to a disagreement in the C# and CLI specification on the rule of array covariance:

CLI

"if X is assignment compatible with Y then X[] is assignment compatible with Y[]"

C#

"if X is a reference type implicitly convertible to reference type Y then X[] is implicitly convertible to Y[]"

Whilst this doesn’t directly point to the generics case with IEnumerable<out T>, one would expect they are one and the same, otherwise you end up with different rules for int[] and IEnumerable<int> where (new int[0] is IEnumerable<int>) == true.. now that would be weird!

References:

Eric Lippert – Why is covariance of value-typed arrays inconsistent?

Question on StackOverflow – why does my C# array lose type sign information when cast to object?

Interesting observation on C# 4’s optional parameter

The other day I had an interesting observation on the optional parameters in C# 4, whereby if you specify a parameter as optional on an interface you don’t actually have to make that parameter optional on any implementing class:

   1: public interface MyInterface

   2: {

   3:     void TestMethod(bool flag=false);

   4: }

   5:

   6: public class MyClass : MyInterface

   7: {

   8:     public void TestMethod(bool flag)

   9:     {

  10:         Console.WriteLine(flag);

  11:     }

  12: }

Which means you won’t be able to use the implementing class and the interface interchangeably:

   1: var obj = new MyClass();

   2: obj.TestMethod(); // compiler error

   3:

   4: var obj2 = new MyClass() as MyInterface;

   5: obj2.TestMethod(); // prints false

Naturally, this bags the question of why the compiler doesn’t enforce the implementation to match the default value specified by the contract?

Luckily, my subsequent question on SO was answered by Eric Lippert from the C# compiler team, not to waste time and effort repeating what’s already been said, check out his answer and it’s clear to see the rationale here and why it would be impractical and inconvenient should the compiler does it differently.

References:

My question on StackOverflow

Article on positives and pitfalls of using optional parameters

Performance Test – Prime numbers with LINQ vs PLINQ vs F#

Having spent quite a bit of time coding in F# recently I have thoroughly enjoyed the experience of coding in a functional style and come to really like the fact you can do so much with so little code.

One of the counter-claims against F# has always been the concerns over performance in the most performance critical applications, and with that in mind I decided to do a little experiment of my own using C# (LINQ & PLINQ) and F# to generate all the prime numbers under a given max value.

The LINQ and PLINQ methods in C# look something like this:

private static void DoCalcSequentially(int max)
{
    var numbers = Enumerable.Range(3, max-3);
    var query =
        numbers
            .Where(n => Enumerable.Range(2, (int)Math.Sqrt(n))
            .All(i => n % i != 0));
    query.ToArray();
}

private static void DoCalcInParallel(int max)
{
    var numbers = Enumerable.Range(3, max-3);
    var parallelQuery =
        numbers
            .AsParallel()
            .Where(n => Enumerable.Range(2, (int)Math.Sqrt(n))
            .All(i => n % i != 0));
    parallelQuery.ToArray();
}

The F# version on the other hand uses the fairly optimized algorithm I had been using in most of my project euler solutions:

let mutable primeNumbers = [2]

// generate all prime numbers under <= this max
let getPrimes max =
    // only check the prime numbers which are <= the square root of the number n
    let hasDivisor n =
        primeNumbers
        |> Seq.takeWhile (fun n' -> n' <= int(sqrt(double(n))))
        |> Seq.exists (fun n' -> n % n' = 0)

    // only check odd numbers <= max
    let potentialPrimes = Seq.unfold (fun n -> if n > max then None else Some(n, n+2)) 3
    // populate the prime numbers list
    for n in potentialPrimes do if not(hasDivisor n) then primeNumbers <- primeNumbers @ [n]

    primeNumbers

Here’s the average execution time in milliseconds for each of these methods over 3 runs for max = 1000, 10000, 100000, 1000000:

image

Have to admit this doesn’t make for a very comfortable reading…on average the F# version, despite being optimized, runs over 3 – 6 times as long as the standard LINQ version! The PLINQ version on the other hand, is slower in comparison to the standard LINQ version when the set of data is small as the overhead of partitioning, collating and coordinating the extra threads actually slows things down, but on a larger dataset the benefit of parallel processing starts to shine through.

UPDATE 13/11/2010:

Thanks for Jaen’s comment, the cause for the F# version of the code to be much slower is because of this line:

primeNumbers <- primeNumbers @ [n]

because a new list is constructed every time and all elements from the previous list copied over.

Unfortunately, there’s no way to add an element to an existing List or Array in F# without getting a new list back (at least I don’t know of a way to do this), so to get around this performance handicap the easiest way is to make the prime numbers list a generic List instead (yup, luckily you are free to use CLR types in F#):

open System.Collections.Generic

// initialize the prime numbers list with 2
let mutable primeNumbers = new List<int>()
primeNumbers.Add(2)

// as before
...

    // populate the prime numbers list
    for n in potentialPrimes do if not(hasDivisor n) then primeNumbers.Add(n)

    primeNumbers

With this change, the performance of the F# code is now comparable to that of the standard LINQ version.

ThreadStatic vs ThreadLocal<T>

Occasionally you might want to make the value of a static or instance field local to a thread (i.e. each thread holds an independent copy of the field), what you need in this case, is a thread-local storage.

In C#, there are mainly two ways to do this.

ThreadStatic

You can mark a field with the ThreadStatic attribute:

<br />
[ThreadStatic]<br />
public static int _x;<br />
…<br />
Enumerable.Range(1, 10).Select(i =&gt; new Thread(() =&gt; Console.WriteLine(_x++))).ToList()<br />
          .ForEach(t =&gt; t.Start()); // prints 0 ten times<br />

Whilst this is the easiest way to implement thread-local storage in C# it’s important to understand the limitations here:

  • the ThreadStatic attribute doesn’t work with instance fields, it compiles and runs but does nothing..

<br />
[ThreadStatic]<br />
public int _x;<br />
…<br />
Enumerable.Range(1, 10).Select(i =&gt; new Thread(() =&gt; Console.WriteLine(_x++))).ToList()<br />
          .ForEach(t =&gt; t.Start()); // prints 0, 1, 2, … 9<br />
  • field always start with the default value

<br />
[ThreadStatic]<br />
public static int _x = 1;<br />
…<br />
Enumerable.Range(1, 10).Select(i =&gt; new Thread(() =&gt; Console.WriteLine(_x++))).ToList()<br />
          .ForEach(t =&gt; t.Start()); // prints 0 ten times<br />

ThreadLocal<T>

C#  4 has introduced a new class specifically for the thread-local storage of data – the ThreadLocal<T> class:

<br />
private readonly ThreadLocal&lt;int&gt; _localX = new ThreadLocal&lt;int&gt;(() =&gt; 1);<br />
…<br />
Enumerable.Range(1, 10).Select(i =&gt; new Thread(() =&gt; Console.WriteLine(_localX++))).ToList()<br />
          .ForEach(t =&gt; t.Start()); // prints 1 ten times<br />

There are some bonuses to using the ThreadLocal<T> class:

  • values are lazily evaluated, the factory function evaluates on the first call for each thread
  • you have more control over the initialization of the field and is able to initialize the field with a non-default value

Summary

As you can see, using ThreadLocal<T> has some clear advantages over ThreadStatic, though using 4.0 only features like ThreadLocal<T> means you have to target your project at the .Net 4 framework and is therefore not backward compatible with previous versions of the framework.

It’s also worth noting that besides ThreadLocal<T> and ThreadStatic you can also use Thread.GetData and Thread.SetData to fetch and store thread specific data from and to a named LocalDataStoreSlot though this is usually cumbersome…