Eric Lippert on the decision to omit Enumerable.ForEach

Found another interesting post on Eric Lippert’s blog, this one explain the rationales behind why there’s no built-in Enumerable.ForEach extension method, one which myself and no doubt many others had decided to implement ourselves.

As he explains, there are two main philosophical reasons why he’s against such an extension method:

"The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects."

"The second reason is that doing so adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:

foreach(Foo foo in foos){ statement involving foo; }

into this code:

foos.ForEach((Foo foo)=>{ statement involving foo; });

which uses almost exactly the same characters in slightly different order. And yet the second version is harder to understand, harder to debug, and introduces closure semantics, thereby potentially changing object lifetimes in subtle ways."

Well, that clears a few things up, hearing from one of the guys on the C# compiler team.

References:

Eric Lippert – foreach vs ForEach

LINQ – Some pitfalls to look out for

As great as LINQ is and how it has transformed the way we interact with data in .Net to the point many of us wonder how we had managed without it all this time! There are however, some pitfalls one can fall into, especially with the concept of delayed execution which is easily the most misunderstood aspect of LINQ.

Despite feeling pretty competent at LINQ and having grasped a decent understanding of how delay execution works, I still find myself making the odd mistakes which resulted in bugs that are hard to detect as they don’t tend to fail very loudly. So as a reminder to myself and anyone who’s had similar experience with these WTF bugs, here’s some pitfalls to look out for.

Before we start, here’s a simple Person class which we will reuse over and over:

   1: public class Person

   2: {

   3:     public string Name { get; set; }

   4:     public int Age { get; set; }

   5: }

Exhibit 1 – modifying items in an IEnumerable

As you’re passing IEnumerable objects around in your code, adding projection to projections, this might be a pattern which you have witnessed before:

   1: public void ExhibitOne()

   2: {

   3:     var names = new[] { "yan", "yinan" };     // define some names

   4:     var persons = InitializePersons(names);

   5:

   6:     foreach (var person in persons)

   7:     {

   8:         Console.WriteLine("Name: {0}, Age: {1}", person.Name, person.Age);

   9:     }

  10: }

  11:

  12: public IEnumerable<Person> InitializePersons(IEnumerable<string> names)

  13: {

  14:     // project the name strings to Person objects

  15:     var persons = names.Select(n => new Person { Name = n });

  16:

  17:     // set their age and return the Person objects

  18:     SetAge(persons);

  19:

  20:     return persons;

  21: }

  22:

  23: public void SetAge(IEnumerable<Person> persons)

  24: {

  25:     // set each person's age to 28

  26:     foreach (var person in persons)

  27:     {

  28:         person.Age = 28;

  29:     }

  30: }

Any guess on the output of the ExhibitOne method?

image

Not what you were expecting?

What had happened here is that the loops in both ExhibitOne and SetAge use a projection (from string to Person object) of the names array, a projection that is evaluated and its items fetched at the point when it’s actually needed. As a result, both loops loop through a new set of Person objects created by this line:

   1: // project the name strings to Person objects

   2: var persons = names.Select(n => new Person { Name = n });

hence why the work SetAge had done is not reflected in the ExhibitOne method.

Remedy

The fix here is simple, simply ‘materialize’ the Person objects in the InitializePersons method before passing them onto the SetAge method so that when you modify the Person objects in the array you’re modifying the same objects that will be passed back to the ExhibitOne method:

   1: public IEnumerable<Person> InitializePersons(IEnumerable<string> names)

   2: {

   3:     // project the name strings to Person objects

   4:     var persons = names.Select(n => new Person { Name = n }).ToArray();

   5:

   6:     // set their age and return the Person objects

   7:     SetAge(persons);

   8:

   9:     return persons;

  10: }

This outputs:

image

Verdict

Whilst this will generate the expected result IF the persons parameter passed to the SetAge method is an array or list of some sort, it does leave room for things to go wrong and when they do it’s a pain to debug as the defect might manifest itself in all kinds of strange ways.

Therefore I would strongly suggest that anytime you find yourself iterating through an IEnumerable collection and modifying its elements you should substitute the IEnumerable type with either an array or list.

Exhibit 2 – dangerous overloads

As you’re building libraries it’s often useful to provide overloads to cater for single item as well as collections, for example:

   1: public void ExhibitTwo()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:     Print(persons); // IEnumerable T?

   8:     Print(persons.ToList()); // IEnumerable T?

   9: }

  10:

  11: public void Print<T>(IEnumerable<T> items)

  12: {

  13:     Console.WriteLine("IEnumerable T");

  14: }

  15:

  16: public void Print<T>(T item)

  17: {

  18:     Console.WriteLine("Single T");

  19: }

This outputs:

image

WTF? Yes I hear you, but C#’s overload resolution algorithm determines that Person[] and List<Person> are better matched to T than IEnumerable<T> in these cases because no implicit casting is required (which the IEnumerable<T> overload does in order to make them match IEnumerable<Person>).

Remedy

   1: Print(persons.AsEnumerable());

This prints

image

Alternatively, if you were to add further overloads for array and lists:

   1: public void Print<T>(IEnumerable<T> items)

   2: {

   3:     Console.WriteLine("IEnumerable T");

   4: }

   5:

   6: public void Print<T>(List<T> items)

   7: {

   8:     Console.WriteLine("List T");

   9: }

  10:

  11: public void Print<T>(T[] items)

  12: {

  13:     Console.WriteLine("Array T");

  14: }

  15:

  16: public void Print<T>(T item)

  17: {

  18:     Console.WriteLine("Single T");

  19: }

then the right methods will be called:

   1: public void ExhibitTwo()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:     Print(persons);                 // Array T

   8:     Print(persons.ToList());        // List T

   9:     Print(persons.AsEnumerable());  // prints IEnumerable T

  10:     Print(persons.First());         // prints Single T

  11: }

The obvious downside here is that you need to provide an overload for EVERY collection type which is far from ideal!

Verdict

Obviously, to expect the callers to always remember to case their collection as an enumerable is unrealistic, in my opinion, it’s always better to leave as little room for confusion as possible and therefore the approach I’d recommend is:

  • rename the methods so that it’s clear to the caller which method they should be calling, i.e.
   1: public void Print<T>(T item)

   2: {

   3:     Console.WriteLine("Single T");

   4: }

   5:

   6: public void PrintBatch<T>(IEnumerable<T> items)

   7: {

   8:     Console.WriteLine("IEnumerable T");

   9: }

or

   1: public void SinglePrint<T>(T item)

   2: {

   3:     Console.WriteLine("Single T");

   4: }

   5:

   6: public void Print<T>(IEnumerable<T> items)

   7: {

   8:     Console.WriteLine("IEnumerable T");

   9: }

Exhibit 3 – Enumerable.Except returns distinct items

I have already covered this topic earlier, read more about it here.

Verdict

One to remember, Enumerable.Except performs a set operation and a set by definition contains only distinct items, just keep this simple rule in mind next time you use it.

Exhibit 4 – Do you know your T from your object?

This one should be rare, but interestingly nonetheless:

   1: public void ExhibitFour()

   2: {

   3:     var persons = new[] {

   4:                         new Person { Name = "Yan", Age = 28 },

   5:                         new Person { Name = "Yinan", Age = 28 },

   6:                     };

   7:

   8:     FirstMethod(persons);

   9: }

  10:

  11: public void FirstMethod(IEnumerable<object> items)

  12: {

  13:        SecondMethod(items.ToList());

  14: }

  15:

  16: public void SecondMethod<T>(IList<T> items)

  17: {

  18:        Console.WriteLine(typeof(T));

  19: }

What do you think this code prints? Object or Person? The answer is … Object!

How is it that typeof(T) returns Object instead of Person? The reason is simple actually, it’s because items.ToList() returns List<object> as the compile time time of items is IEnumerable<object> instead of IEnumerable<Person>.

Remedy

Turn FirstMethod into a generic method and everything will flow naturally:

   1: public void FirstMethod<T>(IEnumerable<T> items)

   2: {

   3:        SecondMethod(items.ToList());

   4: }

References:

Eric Lippert’s post on Generics != Templates

My question on StackOverflow regards overload resolution

Enumerable.Except returns distinct items

This one took me by surprise a little but there’s something about the Enumerable.Except method which you ought to know about. At first you might have assumed (as I did) that given an array as the base set and calling the Except extension method with a second array will return all the items in the first array which do not appear in the second array.

Well, not exactly, for instance:

var list1 = new[] { 1, 1, 1, 1 };
var list2 = Enumerable.Empty<int>();

var result = list1.Except(list2); // returns { 1 } instead of the expected { 1, 1, 1, 1 }

var list3 = new[] { 1, 1, 2, 2, 3 };
var list4 = new[] { 1 };

var result2 = list3.Except(list4); // returns { 2, 3 } instead of the expected { 2, 2, 3 }

Does it surprise you that only distance instances of the original array is returned by the Except method? Whilst I haven’t found any official Microsoft documentation/article on why Except behaves in this way, the best explanation I have heard so far is that Except performs a set operation, according to MSDN:

Produces the set difference of two sequences by using the default equality comparer to compare values.

And a Set in the mathematical sense is defined as:

A set is a collection of distinct objects, considered as an object in its own right

Functional programming with Linq – Enumerable.SequenceEqual

Yet another useful method on the Enumerable class, the SequenceEqual method does exactly what it says on the tin and tells you whether or not two sequences are of equal length and their corresponding elements are equal according to either the default or supplied equality comparer:

var list1 = new List<int>() {0 ,1 ,2, 3, 4, 5, 6 };
var list2 = new List<int>() {0 ,1 ,2, 3, 4, 5, 6 };
var list3 = new List<int>() {6 ,5 ,4, 3, 2, 1, 0 };

list1.SequenceEqual(list2); // returns true
list1.SequenceEqual(list3); // returns false

As you know, for reference types the default equality comparer compares the reference itself hence:

class Pet
{
    public string Name { get; set; }
    public int Age { get; set; }
}
…
Pet pet1 = new Pet { Name = "Turbo", Age = 2 };
Pet pet2 = new Pet { Name = "Peanut", Age = 8 };

// Create two lists of pets.
var pets1 = new List<Pet> { pet1, pet2 };
var pets2 = new List<Pet> { pet1, pet2 };
var test1 = pets1.SequenceEqual(pets2); // returns true

var pets3 = new List<Pet> { pet1, new Pet { Name = "Peanut", Age = 8 } };
var test2 = pets1.SequenceEqual(pets3); // returns false

There are a number of ways you can get around this, including:

  • make Pet a value type, i.e. struct
  • make Pet implement the IEquatable interface
  • create an EqualityComparer and use the overloaded SequenceEqual method which takes an equality comparer

Here is an interesting usage of the SequenceEqual method to help find duplicates in a list of lists (see this StackOverflow question) as provided by Judah Himango:

var lists = new List<List<int>>()
{
    new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
    new List<int>() {0 ,1, 2, 3, 4, 5, 6 },
    new List<int>() {0 ,1, 4, 2, 4, 5, 6 },
    new List<int>() {0 ,3, 2, 5, 1, 6, 4 }
};

var duplicates = from list in lists
                 where lists.Except(new[] { list }).Any(l => l.SequenceEqual(list))
                 select list;

Enumerable.Append and Enumerable.Prepend extension methods

Another gem I found on StackOverflow today (I’ve been spending a lot of time there these last couple of days..), this time in the form of a question on how to append or prepend a single value to an IEnumerable<T>.

Greg provided an elegant solution to this particular problem, and here’s his answer:

public static IEnumerable<T> Append<T>(this IEnumerable<T> source, T item)
{
    foreach (T i in source)
        yield return i;

    yield return item;
}

public static IEnumerable<T> Prepend<T>(this IEnumerable<T> source, T item)
{
    yield return item;

    foreach (T i in source)
        yield return i;
}

Which you can use like this:

var all = GetHeaders().Append(GetData());

As others have stated, what’s so nice about this solution is that it does not mutate the original collection, instead it generates a new IEnumerbale<T>.