LINQ – Some pitfalls to look out for

As great as LINQ is and how it has transformed the way we interact with data in .Net to the point many of us wonder how we had managed without it all this time! There are however, some pitfalls one can fall into, especially with the concept of delayed execution which is easily the most misunderstood aspect of LINQ.

Despite feeling pretty competent at LINQ and having grasped a decent understanding of how delay execution works, I still find myself making the odd mistakes which resulted in bugs that are hard to detect as they don’t tend to fail very loudly. So as a reminder to myself and anyone who’s had similar experience with these WTF bugs, here’s some pitfalls to look out for.

Before we start, here’s a simple Person class which we will reuse over and over:

   1: public class Person

   2: {

   3:     public string Name { get; set; }

   4:     public int Age { get; set; }

   5: }

Exhibit 1 – modifying items in an IEnumerable

As you’re passing IEnumerable objects around in your code, adding projection to projections, this might be a pattern which you have witnessed before:

   1: public void ExhibitOne()

   2: {

   3:     var names = new[] { "yan", "yinan" };     // define some names

   4:     var persons = InitializePersons(names);

   5:

   6:     foreach (var person in persons)

   7:     {

   8:         Console.WriteLine("Name: {0}, Age: {1}", person.Name, person.Age);

   9:     }

  10: }

  11:

  12: public IEnumerable<Person> InitializePersons(IEnumerable<string> names)

  13: {

  14:     // project the name strings to Person objects

  15:     var persons = names.Select(n => new Person { Name = n });

  16:

  17:     // set their age and return the Person objects

  18:     SetAge(persons);

  19:

  20:     return persons;

  21: }

  22:

  23: public void SetAge(IEnumerable<Person> persons)

  24: {

  25:     // set each person's age to 28

  26:     foreach (var person in persons)

  27:     {

  28:         person.Age = 28;

  29:     }

  30: }

Any guess on the output of the ExhibitOne method?

image

Not what you were expecting?

What had happened here is that the loops in both ExhibitOne and SetAge use a projection (from string to Person object) of the names array, a projection that is evaluated and its items fetched at the point when it’s actually needed. As a result, both loops loop through a new set of Person objects created by this line:

   1: // project the name strings to Person objects

   2: var persons = names.Select(n => new Person { Name = n });

hence why the work SetAge had done is not reflected in the ExhibitOne method.

Remedy

The fix here is simple, simply ‘materialize’ the Person objects in the InitializePersons method before passing them onto the SetAge method so that when you modify the Person objects in the array you’re modifying the same objects that will be passed back to the ExhibitOne method:

   1: public IEnumerable<Person> InitializePersons(IEnumerable<string> names)

   2: {

   3:     // project the name strings to Person objects

   4:     var persons = names.Select(n => new Person { Name = n }).ToArray();

   5:

   6:     // set their age and return the Person objects

   7:     SetAge(persons);

   8:

   9:     return persons;

  10: }

This outputs:

image

Verdict

Whilst this will generate the expected result IF the persons parameter passed to the SetAge method is an array or list of some sort, it does leave room for things to go wrong and when they do it’s a pain to debug as the defect might manifest itself in all kinds of strange ways.

Therefore I would strongly suggest that anytime you find yourself iterating through an IEnumerable collection and modifying its elements you should substitute the IEnumerable type with either an array or list.

Exhibit 2 – dangerous overloads

As you’re building libraries it’s often useful to provide overloads to cater for single item as well as collections, for example:

   1: public void ExhibitTwo()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:     Print(persons); // IEnumerable T?

   8:     Print(persons.ToList()); // IEnumerable T?

   9: }

  10:

  11: public void Print<T>(IEnumerable<T> items)

  12: {

  13:     Console.WriteLine("IEnumerable T");

  14: }

  15:

  16: public void Print<T>(T item)

  17: {

  18:     Console.WriteLine("Single T");

  19: }

This outputs:

image

WTF? Yes I hear you, but C#’s overload resolution algorithm determines that Person[] and List<Person> are better matched to T than IEnumerable<T> in these cases because no implicit casting is required (which the IEnumerable<T> overload does in order to make them match IEnumerable<Person>).

Remedy

   1: Print(persons.AsEnumerable());

This prints

image

Alternatively, if you were to add further overloads for array and lists:

   1: public void Print<T>(IEnumerable<T> items)

   2: {

   3:     Console.WriteLine("IEnumerable T");

   4: }

   5:

   6: public void Print<T>(List<T> items)

   7: {

   8:     Console.WriteLine("List T");

   9: }

  10:

  11: public void Print<T>(T[] items)

  12: {

  13:     Console.WriteLine("Array T");

  14: }

  15:

  16: public void Print<T>(T item)

  17: {

  18:     Console.WriteLine("Single T");

  19: }

then the right methods will be called:

   1: public void ExhibitTwo()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:     Print(persons);                 // Array T

   8:     Print(persons.ToList());        // List T

   9:     Print(persons.AsEnumerable());  // prints IEnumerable T

  10:     Print(persons.First());         // prints Single T

  11: }

The obvious downside here is that you need to provide an overload for EVERY collection type which is far from ideal!

Verdict

Obviously, to expect the callers to always remember to case their collection as an enumerable is unrealistic, in my opinion, it’s always better to leave as little room for confusion as possible and therefore the approach I’d recommend is:

  • rename the methods so that it’s clear to the caller which method they should be calling, i.e.
   1: public void Print<T>(T item)

   2: {

   3:     Console.WriteLine("Single T");

   4: }

   5:

   6: public void PrintBatch<T>(IEnumerable<T> items)

   7: {

   8:     Console.WriteLine("IEnumerable T");

   9: }

or

   1: public void SinglePrint<T>(T item)

   2: {

   3:     Console.WriteLine("Single T");

   4: }

   5:

   6: public void Print<T>(IEnumerable<T> items)

   7: {

   8:     Console.WriteLine("IEnumerable T");

   9: }

Exhibit 3 – Enumerable.Except returns distinct items

I have already covered this topic earlier, read more about it here.

Verdict

One to remember, Enumerable.Except performs a set operation and a set by definition contains only distinct items, just keep this simple rule in mind next time you use it.

Exhibit 4 – Do you know your T from your object?

This one should be rare, but interestingly nonetheless:

   1: public void ExhibitFour()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:

   8:     FirstMethod(persons);

   9: }

  10:

  11: public void FirstMethod(IEnumerable<object> items)

  12: {

  13:        SecondMethod(items.ToList());

  14: }

  15:

  16: public void SecondMethod<T>(IList<T> items)

  17: {

  18:        Console.WriteLine(typeof(T));

  19: }

What do you think this code prints? Object or Person? The answer is … Object!

How is it that typeof(T) returns Object instead of Person? The reason is simple actually, it’s because items.ToList() returns List<object> as the compile time time of items is IEnumerable<object> instead of IEnumerable<Person>.

Remedy

Turn FirstMethod into a generic method and everything will flow naturally:

   1: public void FirstMethod<T>(IEnumerable<T> items)

   2: {

   3:        SecondMethod(items.ToList());

   4: }

References:

Eric Lippert’s post on Generics != Templates

My question on StackOverflow regards overload resolution

  • Alex

    Thanks for the interesting post, I learned a couple of things.