As great as LINQ is and how it has transformed the way we interact with data in .Net to the point many of us wonder how we had managed without it all this time! There are however, some pitfalls one can fall into, especially with the concept of delayed execution which is easily the most misunderstood aspect of LINQ.
Despite feeling pretty competent at LINQ and having grasped a decent understanding of how delay execution works, I still find myself making the odd mistakes which resulted in bugs that are hard to detect as they don’t tend to fail very loudly. So as a reminder to myself and anyone who’s had similar experience with these WTF bugs, here’s some pitfalls to look out for.
Before we start, here’s a simple Person class which we will reuse over and over:
Exhibit 1 – modifying items in an IEnumerable
As you’re passing IEnumerable objects around in your code, adding projection to projections, this might be a pattern which you have witnessed before:
Any guess on the output of the ExhibitOne method?
Not what you were expecting?
What had happened here is that the loops in both ExhibitOne and SetAge use a projection (from string to Person object) of the names array, a projection that is evaluated and its items fetched at the point when it’s actually needed. As a result, both loops loop through a new set of Person objects created by this line:
hence why the work SetAge had done is not reflected in the ExhibitOne method.
The fix here is simple, simply ‘materialize’ the Person objects in the InitializePersons method before passing them onto the SetAge method so that when you modify the Person objects in the array you’re modifying the same objects that will be passed back to the ExhibitOne method:
Whilst this will generate the expected result IF the persons parameter passed to the SetAge method is an array or list of some sort, it does leave room for things to go wrong and when they do it’s a pain to debug as the defect might manifest itself in all kinds of strange ways.
Therefore I would strongly suggest that anytime you find yourself iterating through an IEnumerable collection and modifying its elements you should substitute the IEnumerable type with either an array or list.
Exhibit 2 – dangerous overloads
As you’re building libraries it’s often useful to provide overloads to cater for single item as well as collections, for example:
WTF? Yes I hear you, but C#‘s overload resolution algorithm determines that Person and List<Person> are better matched to T than IEnumerable<T> in these cases because no implicit casting is required (which the IEnumerable<T> overload does in order to make them match IEnumerable<Person>).
Alternatively, if you were to add further overloads for array and lists:
then the right methods will be called:
The obvious downside here is that you need to provide an overload for EVERY collection type which is far from ideal!
Obviously, to expect the callers to always remember to case their collection as an enumerable is unrealistic, in my opinion, it’s always better to leave as little room for confusion as possible and therefore the approach I’d recommend is:
- rename the methods so that it’s clear to the caller which method they should be calling, i.e.
Exhibit 3 – Enumerable.Except returns distinct items
I have already covered this topic earlier, read more about it here.
One to remember, Enumerable.Except performs a set operation and a set by definition contains only distinct items, just keep this simple rule in mind next time you use it.
Exhibit 4 – Do you know your T from your object?
This one should be rare, but interestingly nonetheless:
What do you think this code prints? Object or Person? The answer is … Object!
How is it that typeof(T) returns Object instead of Person? The reason is simple actually, it’s because items.ToList() returns List<object> as the compile time time of items is IEnumerable<object> instead of IEnumerable<Person>.
Turn FirstMethod into a generic method and everything will flow naturally: