Feb 11th – Scott Guthrie Talks Shop IV

This is an upcoming web seminar with Scott Gu, organised by the good folks at Linked .Net User Group (http://www.lidnug.org).

I attended one of their previous presentations with Joe Albahari on the new features in C# 4.0 (the recorded presentation can be viewed here) and thoroughly enjoyed that! A FREE web seminar with someone as influential in the .Net space as Scott Gu’s gotta be good!

Who those of you who doesn’t know who Scott Gu is, he’s a VP and one of the founding members of the Microsoft’s .Net platform and runs the development teams responsible for Visual Studio developer tools and technologies for building web applications (e.g. ASP.NET MVC).

(Unconfirmed story!) Whilst still an intern, he developed the original ASP.NET platform with another senior developer at Microsoft, just to give you some idea about who the man is :-)

The link to the event is:

https://www.livemeeting.com/cc/microsoft/join?id=G2K4BH&role=attend&pw=PN6.%3CQ%5Drb

and to find out what time it starts in your time zone, go here:

http://www.timeanddate.com/worldclock/fixedtime.html?month=2&day=11&year=2010&hour=10&min=0&sec=0&p1=256

In the meantime, check out Scott Gu’s excellent blog!

Garbage Spewers

I spent a bit of time on Ayende’s blog today, finally catching up with a series of performance-related blog posts he made whilst working on the .Net profiler (the man’s a living legend, the quality AND quantity of his posts is without equal!)

Whilst reading through the various posts, I came across an unfamiliar term – garbage spewers – in patterns for reducing memory usage. To quote from the blog:

Garbage spewers are pieces of code that allocate a lot of memory that will have to be freed soon afterward.

Ayende highlight a common case of garbage spewers where you continuously concatenate a string using +=:

public string Concat(string[] items)
{
   string result = "";
   foreach(var item in items)
      results += item;
   return result;
}

As you know, string is an immutable type in .Net, that is, a type that cannot be updated once it’s been created. Therefore every time results += item is run a new string variable has to be created to hold the concatenated value of results and item, and the reference pointer stored in the results variable is updated to point to the newly created string.

As Ayende pointed out, this loop can consume a lot of memory which will be cleaned up eventually at the expense of a performance hit as this puts more pressure on the GC.

Other common cases involve loading and converting Data Transfer Objects from the database into various different forms and consuming a lot of memory unnecessarily along the way.

Threading – Using ReaderWriterLockSlim

When dealing with concurrency/threading issues in .Net, the normal approach is to use lock() to lock a dedicated sync object like this:

private static readonly object padlock = new object();
…
lock(padlock)
{
     // enter critical section here
}

This is an efficient, simple and well proven way to get thread-safety in .Net and is probably all you’ll ever need in your project. However, as this approach ensures only one thread can enter the critical section at any one time it’s not ideal for managing access to a shared resource because all read and write operations must be made sequentially and this results in a performance hit on your application. Ideally all threads should be able to read the shared resource simultaneously but only one thread can update it and no other threads can read from the resource whilst the update is happening.

.Net’s answer to this used to be the ReaderWriterLock, which since .Net 3.5 has been superseded by the better, more lightweight and performing ReaderWriterLockSlim which Microsoft is recommending for all new development.

The ReaderWriterLockSlim has three modes: Read, Ungradeable and Write:

  • many threads can enter Read lock simultaneously
  • only one thread can enter Ungradeable lock but other threads can still enter Read lock
  • only one thread can enter Write lock and no other thread can enter any lock

In practice though, almost all threading problems can be solved using read and write locks alone. To enter read/write lock, you need something like this:

private static readonly ReaderWriteLockSlim rwlock = new ReaderWriterLockSlim();
…
rwlock.EnterReadLock()
try
{
     // do something here
}
finally
{
     // don't forget to release the lock afterwards!
     rwlock.ExitReadLock();
}

As you can see, this can seriously clutter your code if you require read/write locks in many places! I saw this post on StackOverflow quite some time back and really like the pattern that Marc has presented:

http://stackoverflow.com/questions/170028/how-would-you-simplfy-entering-and-exiting-a-readerwriterlock

In his answer he showed how you can simplify the code needed to enter a read lock, here’s the full implementation you can use in your project:

public static class ReaderWriterLockSlimExtensions
{
     private sealed class ReadLockToken : IDisposable
     {
          private ReaderWriterLockSlim _sync;
          public ReadLockToken(ReaderWriterLockSlim sync)
          {
               _sync = sync;
               sync.EnterReadLock();
          }
          public void Dispose()
          {
               if (_sync != null)
               {
                    _sync.ExitReadLock();
                    _sync = null;
               }
          }
     }
     private sealed class WriteLockToken : IDisposable
     {
          private ReaderWriterLockSlim _sync;
          public WriteLockToken(ReaderWriterLockSlim sync)
          {
               _sync = sync;
               sync.EnterWriteLock();
          }
          public void Dispose()
          {
               if (_sync != null)
               {
                    _sync.ExitWriteLock();
                    _sync = null;
               }
          }
     }

     public static IDisposable Read(this ReaderWriterLockSlim obj)
     {
          return new ReadLockToken(obj);
     }
     public static IDisposable Write(this ReaderWriterLockSlim obj)
     {
          return new WriteLockToken(obj);
     }
}

And to use it in your code, this is all you’ll need:

using (_sync.Read())
{
     // do reading here
}
using (_sync.Write())
{
     // do writing here
}

Further Readings:

Jon Skeet’s tutorial on Multi-Threading in .Net

Ayande has a post on using the UpgradeableReadLock

LINQ – choosing between Concat() and Union()

In Linq To Objects, there are two ways you can join two sequences together, using either Concat() or Union(), and as I was wondering how the two differs I came across this post:

http://weblogs.asp.net/fbouma/archive/2009/03/04/choose-concat-over-union-if-possible.aspx

The main thing to take away from this article is:

“If you care about the duplicates, Union() is necessary. However, in the case where you can’t have duplicates in the second sequence or you don’t care, Concat() is a better choice.”

Thinking in T-SQL terms:

Concat() = UNION ALL

Union() = UNION

Joining two sequences using Union()

If by some chance you’re looking to join two potentially duplicated lists together, and you don’t want duplicates in the resulting list, see this question on StackOverflow and see Jon Skeet’s answer for a nice and clean way to do this:

http://stackoverflow.com/questions/590991/merging-two-ienumerablets

Learning F# – Part 4

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Tuple

A tuple (pronounced “two-pull”) is an ordered collection of data, and an easy way to group common pieces of data together.

A tuple type is described by a list of the tuple’s elements’ types, separated by asterisks:

clip_image001

You can even have tuples that contain other tuples:

clip_image002

There’s a number of ways to extract values from a tuple, there’s fst (first) and snd (second) functions if you have a two-elements tuple:

clip_image003

And then there’s the let binding:

clip_image004

But remember, you’ll get a compile error if you try to extract too many or too few values from a tuple.

It is possible to pass tuples as parameters to functions:

clip_image005

Lists

Whereas tuples group values into a single entity, lists allow you to link data together to form a chain. Doing so allows you to process list elements in bulk using aggregate operators.

You can declare a list like this:

clip_image006

Notice in the snippet above the empty list had type ‘a list because it could be of any type, therefore it’s generic.

Unlike other languages, F# lists are quite restrictive in how you access and manipulate them – there are only two operations you can perform with a list:

  1. The first is cons, represented by the :: or cons operator. This joins an element to the front or head of a list:

clip_image007

  1. The second is append, uses the @ operator. Append joins two lists together:

clip_image008

List ranges

Declaring list elements as a semicolon-delimited list quickly becomes tedious, especially for large lists. To declare a list of ordered numeric values, use the list range syntax:

clip_image009

If an optional step value is provided, then the result is a list of values in the range between two numbers separated by the stepping value:

clip_image010

List comprehensions

List comprehensions is a rich syntax that allows you to generate lists inline with F# code. The body of the list comprehension will be executed until it terminates, and the list will be made up of elements returned via the yield keyword:

clip_image011

Almost any F# code can exist inside of list comprehensions, including things like function declarations and for loops:

clip_image012

When using loops within list comprehensions, you can simply the code by using -> instead of do yield:

clip_image013

Here’s a more complex example showing how you can use list comprehension to easily find prime numbers:

clip_image014

List module functions

The F# library’s List module contains many methods to help you process lists:

image

The following example demonstrates the List.partition function, partitioning a list of numbers from 1 to 15 into two new lists: one comprised of multiples of five and the other list made up of everything else:

clip_image015

The trick is that List.partition returns a tuple.

Aggregate Operators

Although lists offer a way to chain together pieces of data, there really isn’t anything special about them. The true power of lists lies in aggregate operators, which are a set of power functions that are useful for any collection of values..

List.map

List.map is a projection operation that creates a new list based on a provided function. Each element in the new list is the result of evaluating the function, it has type (‘a -> ‘b) -> ‘a list -> ‘b list

The following example shows the result of mapping a square function to a list of integers:

clip_image016

List.map is one of the most useful functions in the F# language, it provides an elegant way for you to transform data.

List.fold

Folds represent the most powerful type of aggregate operator and not surprisingly the most complicated. When you have a list of values and you want to distil it down to a single piece of data, you use a fold.

There are two main types of folds you can use on lists, first is List.reduce which has type (‘a -> ‘a -> ‘a) -> ‘a list -> ‘a

List.reduce iterates through each element of a list, building up an accumulator value, which is the summary of the processing done on the list so far. Once every list item has been processed, the final accumulator value is returned, the accumulator’s initial value in List.reduce is the first element of the list.

This example demonstrates how to use List.reduce to comma-separate a list of strings:

clip_image001[7]

Whilst useful, reduce fold forces the type of the accumulator to have the same type as the list. If you want to use a custom accumulator type (e.g. reducing a list of items in a shopping cart to a cash value), you can use List.fold.

The fold function takes three parameters:

  1. A function that when provided an accumulator and list element returns a new accumulator.
  2. An initial accumulator value.
  3. The list to fold over.

The return value of the function is the final state of the accumulator. The type of the fold function is:

(‘acc -> ‘b -> ‘acc) -> ‘acc -> ‘b list -> ‘acc

Here’s an example of how you can use it to count the number of vowels in a string:

clip_image002[7]

Folding right-to-left

List.reduce and List.fold process the list in a left-to-right order. There are alternative functions List.reduceBack and List.foldBack for processing lists in right-to-left order.

Depends on what you are trying to do, processing a list in reverse order can have a substantial impact on performance.

List.iter

The final aggregate operator, List.iter, iterates through each element of the list and calls a function that you pass as a parameter, it has type (‘a -> unit) -> ‘a list -> unit

Because List.iter returns unit, it is predominately used for evaluating the side effect of the given method, meaning that executing the function has some side effect other than its return value (e.g. printfn has the side effect of printing to the console in addition to returning unit):

clip_image001[9]

Option

If you want to represent a value that may or may not exist, the best way to do so is to use the option type. The option type has only two possible values: Some(‘a’) and None.

A typical situation you’ll use an option type is when you want to parse a string as an int and if the string is properly formatted you’ll get an int, but if the string is not properly formatted you’ll get None:

clip_image002[9]

A common idiom in C# is to use null to mean the absence of a value. However, null is also used to indicate an uninitialized value, this duality can lead to confusion and bugs. If you use the option type, there is no question what the value represents, similar to how System.Nullable works in C#.

To retrieve the value of an option, you can use Option.get.

clip_image003[7]

One thing to watch out though, is that if you call Option.get on None, an exception will be thrown. To get around this, you can use Option.isSome or Option.isNone to check before the value of the option type before attempting to access it, similar to System.Nullable.HasValue in C#.

Printfn

printfn comes in three main flavours: printf, printfn, and sprintf.

printf takes the input and writes it to the screen, whereas printfn writes it to the screen and adds a line continuation.

pinrtf has formatting and checking built-in (e.g. printfn “%s is %d%c high” mountain height units), it’s also strong typed and uses F#’s type inference system so the compiler will give you an error if the data doesn’t match the given format specifier.

Here’s a table of printf format specifiers:

image

sprintf is used when you want the result of the printing as a string:

clip_image001[11]

Anatomy of an F# Program

Most other languages, like C#, require an explicit program entry point, often called a main method. In F#, for single-file applications, the contents of the code file are execute from top to bottom in order without the need for declaring a specific main method.

For multi-file projects, however, code needs to be divided into organization units called modules or namespaces.

Modules

By default, F# puts all your code into an anonymous module with the same name as the code file with the first letter capitalized. So if you have a value named value1, and your code is in file1.fs, you can refer to it by using the fully qualified path: File1.value1.

You can explicitly name your code’s module by using the module keyword at the top of a code file:

clip_image001[13]

Files can contain nested modules as well. To declare a nested module, use the module keyword followed by the name of your module and an equals sign =. Nested modules must be indented to be disambiguated from the “top-level” module:

clip_image002[11]

Namespaces

The alternative to modules is namespaces. Namespaces are a unit of organizing code just like modules with the only difference being that namespaces cannot obtain value, only type declarations.

Also, namespaces cannot be nested in the same way that modules can, instead, you can add multiple namespaces to the same file:

clip_image001[15]

It may seem strange to have both namespaces and modules in F#. Modules are optimized for rapid prototyping and quickly exploring a solution, as you have seen so far. Namespaces, on the other hand, are geared toward larger-scale projects with an object-oriented solution.

Program Startup

For single file projects, the code will be executed from top to bottom, however, when a you add a new file to the project, the newly added file will be run when the program starts up.

For more formal program-startup semantics, you can use the [<EntryPoint>] attribute to define a main method. To qualify, your method must:

  • Be the last function defined in the last compiled file in your project.
  • Take a single parameter of type string array, which are the arguments to your program.
  • Return an integer, which is your program’s exit code.