LINQ – Lambda Expression vs Query Expression

As you’re probably aware of already, LINQ comes in two flavours – using Lambda expressions and using SQL-like query expressions:

Func<int, bool> isEven = i => i % 2 == 0;
int[] ints = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };

// using Query expression
var evensQuery = from i in ints where isEven(i) select i;
// using Lambda expression
var evensLambda = ints.Where(isEven);

Both yields the same result because query expressions are translated into their lambda expressions before they’re compiled. So performance-wise, there’s no difference whatsoever between the two.

Which one you should use is mostly personal preference, many people prefer lambda expressions because they’re shorter and more concise, but personally I prefer the query syntax having worked extensively with SQL. With that said, it’s important to bear in mind that there are situations where one will be better suited than the other.

Joins

Here’s an example of how you can join sequence together using Lambda and query expressions:

class Person
{
    public string Name { get; set; }
}
class Pet
{
    public string Name { get; set; }
    public Person Owner { get; set; }
}

void Main()
{
    var magnus = new Person { Name = "Hedlund, Magnus" };
    var terry = new Person { Name = "Adams, Terry" };
    var charlotte = new Person { Name = "Weiss, Charlotte" };
    var barley = new Pet { Name = "Barley", Owner = terry };
    var boots = new Pet { Name = "Boots", Owner = terry };
    var whiskers = new Pet { Name = "Whiskers", Owner = charlotte };
    var daisy = new Pet { Name = "Daisy", Owner = magnus };
    var people = new List<Person> { magnus, terry, charlotte };
    var pets = new List<Pet> { barley, boots, whiskers, daisy };

    // using lambda expression
    var lambda = people.Join(pets,              // outer sequence
                             person => person,  // inner sequence key
                             pet => pet.Owner,  // outer sequence key
                             (person, pet) =>
                                 new { OwnerName = person.Name, Pet = pet.Name });

    // using query expression
    var query = from person in people
                join pet in pets on person equals pet.Owner
                select new { OwnerName = person.Name, Pet = pet.Name };
}

Again, both yields the same result and there is no performance penalties associated with either, but it’s easy to see why query syntax is far more readable and expressive of your intent here than the lambda expression!

Lambda-Only Functions

There are a number of methods that are only available with the Lambda expression, Single(), Take(), Skip(), First() just to name a few. Although you can mix and match the two by calling the Lambda-only methods at the end of the query:

// mix and match query and Lambda syntax
var query = (from person in people
             join pet in pets on person equals pet.Owner
             select new { OwnerName = person.Name, Pet = pet.Name }).Skip(1).Take(2);

As this reduces the readability of your code, it’s generally better to first assign the result of a query expression to a variable and then use Lambda expression using that variable:

var query = from person in people
            join pet in pets on person equals pet.Owner
            select new { OwnerName = person.Name, Pet = pet.Name };

var result = query.Skip(1).Take(2);

Both versions returns the same result because of delayed execution (the query is not executed against the underlying list until you try to iterate through the result variable). Also, because query expressions are translated to Lambda expressions first before being compiled there will not be performance penalties either. BUT, if you don’t want delayed execution, or need to use one of the aggregate functions such as Average() or Sum(), for example, you should be aware of the possibility of the underlying sequence being modified between the assignments to query and result. In this case,I’d argue it’s best to use Lambda expressions to start with or add the Lambda-only methods to the query expression.

Functional programming with Linq – IEnumerable.Aggregate

As I was learning functional programming with F# I came across the List.reduce function which iterates through a list and builds up an accumulator value by running another function against each element in the list.

Back to the more familiar C# territory, LINQ has introduced some functional features to C# and one of these is the Aggregate function on IEnumerable<T> which works in the same way as the List.reduce function.

In the following example, you can use the Aggregate function to built up a comma separated string from an array of string:

var strings = new List<string> { "Jack", "Jill", "Jim", "Joe", "Jane" };
// this returns "Jack, Jill, Jim, Joe, Jane"
var comSeparatedStrings = strings.Aggregate((acc, item) => acc + ", " + item);

Controlling Type conversion in C#

Ever run into a situation where your application needs to use a type for its internal working but occasionally need to convert that type into another just so it can be passed to another application which doesn’t understand some of the base types we have in the .Net space?

Consider the example below, where the Player is a type in my problem domain, but I need to communicate information about the player with a non-.Net client which doesn’t have the Guid type, so I need to have a PlayerDTO type used exclusively for message passing:

public sealed class Player
{
    public Player(string name, long score)
    {
        Name = name;
        Score = score;
        ID = Guid.NewGuid();
    }

    public string Name { get; private set; }
    public Guid ID { get; private set; }
    public long Score { get; private set; }
}

public sealed class PlayerDTO
{
    public PlayerDTO(string name, long score, string id)
    {
        Name = name;
        Score = score;
        ID = id;
    }

    public string Name { get; private set; }
    public string ID { get; private set; } // client is not .Net based so no Guid there
    public long Score { get; private set; }
}

The problem here is that there is no easy way to convert the Player type to PlayerDTO and every time I want to create a new PlayerDTO object I need to manually copy the values from Player into PlayerDTO’s constructor, and my application needs to know that it needs to convert Player.PlayerID into a string.

A cleaner solution here is to overload either the implicit or explicit converter so you can simply cast a Player object into a PlayerDTO object:

// Overload explicit cast converter to allow easy conversion from a Winner to WinnerAmfvo object
public static explicit operator PlayerDTO(Player player)
{
    return new PlayerDTO(player.Name, player.Score, player.PlayerID.ToString());
}

Further more, this allows you to easily convert an array of Player objects into an array of PlayerDTO objects:

// use Array.ConvertAll
PlayerDTO[] playerDTOs = Array.ConvertAll(players, p => (PlayerDTO) p);
// use Linq
playerDTOs = players.Select(p => (PlayerDTO) p).ToArray();

No Interface types allowed

One thing to bear in mind when using the implicit/explicit operators is that you won’t be able to use interface types, and the reason is specified in Section 10.9.3 of the C# spec.

In short, defining an implicit or explicit conversion between reference types gives the user the expectation that there will be a change in reference; after all, the same reference cannot be both types. On the other hand, the user does not have the same expectation for conversions between reference types and interface types.

In these cases, you still have other ways to make type conversion cleaner and centralised, for example:

Create a FromIPlayer method


playerDTOs = iPlayers.Select(p => PlayerDTO.FromPlayer(p)).ToArray();

Put all conversions in a PlayerConvert static object

If you have lots of types which can be converted to many types then it might be a cleaner solution to just put all the conversions into one static object, like the System.Convert class. You then use it like this:


playerDTOs = iPlayers.Select(p => PlayerConvert.ToPlayerDTO(p)).ToArray();

References:

StackOverflow question on why you can't use interface types with implicit/explicit operator

LINQ – choosing between Concat() and Union()

In Linq To Objects, there are two ways you can join two sequences together, using either Concat() or Union(), and as I was wondering how the two differs I came across this post:

http://weblogs.asp.net/fbouma/archive/2009/03/04/choose-concat-over-union-if-possible.aspx

The main thing to take away from this article is:

“If you care about the duplicates, Union() is necessary. However, in the case where you can’t have duplicates in the second sequence or you don’t care, Concat() is a better choice.”

Thinking in T-SQL terms:

Concat() = UNION ALL

Union() = UNION

Joining two sequences using Union()

If by some chance you’re looking to join two potentially duplicated lists together, and you don’t want duplicates in the resulting list, see this question on StackOverflow and see Jon Skeet’s answer for a nice and clean way to do this:

http://stackoverflow.com/questions/590991/merging-two-ienumerablets

Under the cover of i4o

I did some performance optimization work a little while back, and one of the changes which yielded a significant result was when I migrated some server side components (which are CPU intensive and performs a large number of loops) from using ADO.NET DataSets to using POCOs (plain old CLR object).

The looping was then done using LINQ to Objects, and I discovered a nice little extension to LINQ called i4o – which stands for Index for Objects – to help make the loops faster. However, I wasn’t able to observe any difference in performance, which contradicts with the findings on Aaron’s Technology Musing

Digging a little deeper into the i4o source code (admittedly I didn’t do this myself, credit to Mike Barker for doing this!), it turns out that there are a number of drawbacks in i4o which aren’t immediately obvious or mentioned anywhere in the documentation. The biggest problem for us was that it only supports equality comparison, which means it would simply ignore the index you have on the MatchID property if you try to run this query:

var result = from m in Matches where m.MatchID >= 1 select m;

but it’ll use the index on MatchID if you run this query instead:

var result = from m in Matches where m.MatchID == 1 select m;

The conclusion?

i4o is an awesome tool that can turbo boost your LINQ query, but ONLY put indices on properties which you will be doing equality comparison in your queries otherwise you’ll just be wasting some memory space holding indices which would be used at all.