Garbage Spewers

I spent a bit of time on Ayende’s blog today, finally catching up with a series of performance-related blog posts he made whilst working on the .Net profiler (the man’s a living legend, the quality AND quantity of his posts is without equal!)

Whilst reading through the various posts, I came across an unfamiliar term – garbage spewers – in patterns for reducing memory usage. To quote from the blog:

Garbage spewers are pieces of code that allocate a lot of memory that will have to be freed soon afterward.

Ayende highlight a common case of garbage spewers where you continuously concatenate a string using +=:

public string Concat(string[] items)
{
   string result = "";
   foreach(var item in items)
      results += item;
   return result;
}

As you know, string is an immutable type in .Net, that is, a type that cannot be updated once it’s been created. Therefore every time results += item is run a new string variable has to be created to hold the concatenated value of results and item, and the reference pointer stored in the results variable is updated to point to the newly created string.

As Ayende pointed out, this loop can consume a lot of memory which will be cleaned up eventually at the expense of a performance hit as this puts more pressure on the GC.

Other common cases involve loading and converting Data Transfer Objects from the database into various different forms and consuming a lot of memory unnecessarily along the way.

Threading – Using ReaderWriterLockSlim

When dealing with concurrency/threading issues in .Net, the normal approach is to use lock() to lock a dedicated sync object like this:

private static readonly object padlock = new object();
…
lock(padlock)
{
     // enter critical section here
}

This is an efficient, simple and well proven way to get thread-safety in .Net and is probably all you’ll ever need in your project. However, as this approach ensures only one thread can enter the critical section at any one time it’s not ideal for managing access to a shared resource because all read and write operations must be made sequentially and this results in a performance hit on your application. Ideally all threads should be able to read the shared resource simultaneously but only one thread can update it and no other threads can read from the resource whilst the update is happening.

.Net’s answer to this used to be the ReaderWriterLock, which since .Net 3.5 has been superseded by the better, more lightweight and performing ReaderWriterLockSlim which Microsoft is recommending for all new development.

The ReaderWriterLockSlim has three modes: Read, Ungradeable and Write:

  • many threads can enter Read lock simultaneously
  • only one thread can enter Ungradeable lock but other threads can still enter Read lock
  • only one thread can enter Write lock and no other thread can enter any lock

In practice though, almost all threading problems can be solved using read and write locks alone. To enter read/write lock, you need something like this:

private static readonly ReaderWriteLockSlim rwlock = new ReaderWriterLockSlim();
…
rwlock.EnterReadLock()
try
{
     // do something here
}
finally
{
     // don't forget to release the lock afterwards!
     rwlock.ExitReadLock();
}

As you can see, this can seriously clutter your code if you require read/write locks in many places! I saw this post on StackOverflow quite some time back and really like the pattern that Marc has presented:

http://stackoverflow.com/questions/170028/how-would-you-simplfy-entering-and-exiting-a-readerwriterlock

In his answer he showed how you can simplify the code needed to enter a read lock, here’s the full implementation you can use in your project:

public static class ReaderWriterLockSlimExtensions
{
     private sealed class ReadLockToken : IDisposable
     {
          private ReaderWriterLockSlim _sync;
          public ReadLockToken(ReaderWriterLockSlim sync)
          {
               _sync = sync;
               sync.EnterReadLock();
          }
          public void Dispose()
          {
               if (_sync != null)
               {
                    _sync.ExitReadLock();
                    _sync = null;
               }
          }
     }
     private sealed class WriteLockToken : IDisposable
     {
          private ReaderWriterLockSlim _sync;
          public WriteLockToken(ReaderWriterLockSlim sync)
          {
               _sync = sync;
               sync.EnterWriteLock();
          }
          public void Dispose()
          {
               if (_sync != null)
               {
                    _sync.ExitWriteLock();
                    _sync = null;
               }
          }
     }

     public static IDisposable Read(this ReaderWriterLockSlim obj)
     {
          return new ReadLockToken(obj);
     }
     public static IDisposable Write(this ReaderWriterLockSlim obj)
     {
          return new WriteLockToken(obj);
     }
}

And to use it in your code, this is all you’ll need:

using (_sync.Read())
{
     // do reading here
}
using (_sync.Write())
{
     // do writing here
}

Further Readings:

Jon Skeet’s tutorial on Multi-Threading in .Net

Ayande has a post on using the UpgradeableReadLock

LINQ – choosing between Concat() and Union()

In Linq To Objects, there are two ways you can join two sequences together, using either Concat() or Union(), and as I was wondering how the two differs I came across this post:

http://weblogs.asp.net/fbouma/archive/2009/03/04/choose-concat-over-union-if-possible.aspx

The main thing to take away from this article is:

“If you care about the duplicates, Union() is necessary. However, in the case where you can’t have duplicates in the second sequence or you don’t care, Concat() is a better choice.”

Thinking in T-SQL terms:

Concat() = UNION ALL

Union() = UNION

Joining two sequences using Union()

If by some chance you’re looking to join two potentially duplicated lists together, and you don’t want duplicates in the resulting list, see this question on StackOverflow and see Jon Skeet’s answer for a nice and clean way to do this:

http://stackoverflow.com/questions/590991/merging-two-ienumerablets

Learning F# – Part 4

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Tuple

A tuple (pronounced “two-pull”) is an ordered collection of data, and an easy way to group common pieces of data together.

A tuple type is described by a list of the tuple’s elements’ types, separated by asterisks:

clip_image001

You can even have tuples that contain other tuples:

clip_image002

There’s a number of ways to extract values from a tuple, there’s fst (first) and snd (second) functions if you have a two-elements tuple:

clip_image003

And then there’s the let binding:

clip_image004

But remember, you’ll get a compile error if you try to extract too many or too few values from a tuple.

It is possible to pass tuples as parameters to functions:

clip_image005

Lists

Whereas tuples group values into a single entity, lists allow you to link data together to form a chain. Doing so allows you to process list elements in bulk using aggregate operators.

You can declare a list like this:

clip_image006

Notice in the snippet above the empty list had type ‘a list because it could be of any type, therefore it’s generic.

Unlike other languages, F# lists are quite restrictive in how you access and manipulate them – there are only two operations you can perform with a list:

  1. The first is cons, represented by the :: or cons operator. This joins an element to the front or head of a list:

clip_image007

  1. The second is append, uses the @ operator. Append joins two lists together:

clip_image008

List ranges

Declaring list elements as a semicolon-delimited list quickly becomes tedious, especially for large lists. To declare a list of ordered numeric values, use the list range syntax:

clip_image009

If an optional step value is provided, then the result is a list of values in the range between two numbers separated by the stepping value:

clip_image010

List comprehensions

List comprehensions is a rich syntax that allows you to generate lists inline with F# code. The body of the list comprehension will be executed until it terminates, and the list will be made up of elements returned via the yield keyword:

clip_image011

Almost any F# code can exist inside of list comprehensions, including things like function declarations and for loops:

clip_image012

When using loops within list comprehensions, you can simply the code by using -> instead of do yield:

clip_image013

Here’s a more complex example showing how you can use list comprehension to easily find prime numbers:

clip_image014

List module functions

The F# library’s List module contains many methods to help you process lists:

image

The following example demonstrates the List.partition function, partitioning a list of numbers from 1 to 15 into two new lists: one comprised of multiples of five and the other list made up of everything else:

clip_image015

The trick is that List.partition returns a tuple.

Aggregate Operators

Although lists offer a way to chain together pieces of data, there really isn’t anything special about them. The true power of lists lies in aggregate operators, which are a set of power functions that are useful for any collection of values..

List.map

List.map is a projection operation that creates a new list based on a provided function. Each element in the new list is the result of evaluating the function, it has type (‘a -> ‘b) -> ‘a list -> ‘b list

The following example shows the result of mapping a square function to a list of integers:

clip_image016

List.map is one of the most useful functions in the F# language, it provides an elegant way for you to transform data.

List.fold

Folds represent the most powerful type of aggregate operator and not surprisingly the most complicated. When you have a list of values and you want to distil it down to a single piece of data, you use a fold.

There are two main types of folds you can use on lists, first is List.reduce which has type (‘a -> ‘a -> ‘a) -> ‘a list -> ‘a

List.reduce iterates through each element of a list, building up an accumulator value, which is the summary of the processing done on the list so far. Once every list item has been processed, the final accumulator value is returned, the accumulator’s initial value in List.reduce is the first element of the list.

This example demonstrates how to use List.reduce to comma-separate a list of strings:

clip_image001[7]

Whilst useful, reduce fold forces the type of the accumulator to have the same type as the list. If you want to use a custom accumulator type (e.g. reducing a list of items in a shopping cart to a cash value), you can use List.fold.

The fold function takes three parameters:

  1. A function that when provided an accumulator and list element returns a new accumulator.
  2. An initial accumulator value.
  3. The list to fold over.

The return value of the function is the final state of the accumulator. The type of the fold function is:

(‘acc -> ‘b -> ‘acc) -> ‘acc -> ‘b list -> ‘acc

Here’s an example of how you can use it to count the number of vowels in a string:

clip_image002[7]

Folding right-to-left

List.reduce and List.fold process the list in a left-to-right order. There are alternative functions List.reduceBack and List.foldBack for processing lists in right-to-left order.

Depends on what you are trying to do, processing a list in reverse order can have a substantial impact on performance.

List.iter

The final aggregate operator, List.iter, iterates through each element of the list and calls a function that you pass as a parameter, it has type (‘a -> unit) -> ‘a list -> unit

Because List.iter returns unit, it is predominately used for evaluating the side effect of the given method, meaning that executing the function has some side effect other than its return value (e.g. printfn has the side effect of printing to the console in addition to returning unit):

clip_image001[9]

Option

If you want to represent a value that may or may not exist, the best way to do so is to use the option type. The option type has only two possible values: Some(‘a’) and None.

A typical situation you’ll use an option type is when you want to parse a string as an int and if the string is properly formatted you’ll get an int, but if the string is not properly formatted you’ll get None:

clip_image002[9]

A common idiom in C# is to use null to mean the absence of a value. However, null is also used to indicate an uninitialized value, this duality can lead to confusion and bugs. If you use the option type, there is no question what the value represents, similar to how System.Nullable works in C#.

To retrieve the value of an option, you can use Option.get.

clip_image003[7]

One thing to watch out though, is that if you call Option.get on None, an exception will be thrown. To get around this, you can use Option.isSome or Option.isNone to check before the value of the option type before attempting to access it, similar to System.Nullable.HasValue in C#.

Printfn

printfn comes in three main flavours: printf, printfn, and sprintf.

printf takes the input and writes it to the screen, whereas printfn writes it to the screen and adds a line continuation.

pinrtf has formatting and checking built-in (e.g. printfn “%s is %d%c high” mountain height units), it’s also strong typed and uses F#’s type inference system so the compiler will give you an error if the data doesn’t match the given format specifier.

Here’s a table of printf format specifiers:

image

sprintf is used when you want the result of the printing as a string:

clip_image001[11]

Anatomy of an F# Program

Most other languages, like C#, require an explicit program entry point, often called a main method. In F#, for single-file applications, the contents of the code file are execute from top to bottom in order without the need for declaring a specific main method.

For multi-file projects, however, code needs to be divided into organization units called modules or namespaces.

Modules

By default, F# puts all your code into an anonymous module with the same name as the code file with the first letter capitalized. So if you have a value named value1, and your code is in file1.fs, you can refer to it by using the fully qualified path: File1.value1.

You can explicitly name your code’s module by using the module keyword at the top of a code file:

clip_image001[13]

Files can contain nested modules as well. To declare a nested module, use the module keyword followed by the name of your module and an equals sign =. Nested modules must be indented to be disambiguated from the “top-level” module:

clip_image002[11]

Namespaces

The alternative to modules is namespaces. Namespaces are a unit of organizing code just like modules with the only difference being that namespaces cannot obtain value, only type declarations.

Also, namespaces cannot be nested in the same way that modules can, instead, you can add multiple namespaces to the same file:

clip_image001[15]

It may seem strange to have both namespaces and modules in F#. Modules are optimized for rapid prototyping and quickly exploring a solution, as you have seen so far. Namespaces, on the other hand, are geared toward larger-scale projects with an object-oriented solution.

Program Startup

For single file projects, the code will be executed from top to bottom, however, when a you add a new file to the project, the newly added file will be run when the program starts up.

For more formal program-startup semantics, you can use the [<EntryPoint>] attribute to define a main method. To qualify, your method must:

  • Be the last function defined in the last compiled file in your project.
  • Take a single parameter of type string array, which are the arguments to your program.
  • Return an integer, which is your program’s exit code.

Learning F# – Part 3

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Functions

You define functions the same way you define values, except everything after the name of the function servers as the function’s parameters. The following defines a function called square that takes an integer, x, and returns its square:

clip_image001

Unlike C#, F# has no return keyword. The last expression to be evaluated in the function determines the return type.

Also, from the FSI output above, it shows the function square has signature int -> int, which reads as “a function taking an integer and returning an integer”.

Type Inference

Take this add function for example:

clip_image002

Looking at this you might be wondering why does the compiler think that the add function only takes integers? The + operator also works on floats too!

The reason is type inference. Unlike C#, F# doesn’t require you to explicitly state the types of all the parameters to a function, it figures it out based on usage. Because the + operator works for many different types such as byte, int, and decimal, the compiler simply defaults to int if there is no additional information.

The following FSI snippet shows what type inference in action if we not only define the add function but also call it passing in floats, then the function’s signature will be inferred to be of type float -> float -> float instead:

clip_image003

However, you can provide a type annotation, or hint, to the F# compiler about what the types are. To do this, simply replace a function parameter with the following form ident -> (ident: type) like this:

clip_image004

This works because the only overload for + that takes a float as its first parameter is float -> float -> float, so the F# compiler infers y to be a float as well.

Type inference can reduce code clutter by having the compiler figure out what types to use, but the occasional type annotation is required and can sometimes improve code readability.

Generic Functions

You can write functions that work for any type of a parameter, such as an identity function below:

clip_image005

Because the type inference system could not determine a fixed type for value x in the ident function, it was generic. If a parameter is generic, then that parameter can be of any type.

The type of a generic parameter can have the name of any valid identifier prefixed with an apostrophe, but typically letters of the alphabet starting with ‘a’ as you can see from the FSI snippet for the ident function above.

Writing generic code is important for maximizing code reuse.

Scope

Every value declared in F# has a specific scope, more formally referred to as a declaration space.

The default scope is module scope, meaning variables can be used anywhere after their declaration. However, values defined within a function are scoped only to that function.

For example:

clip_image006

The scoping of a variable is important because F# supports nested functions – i.e. you can declare new function values within the body of a function. Nested functions have access to any value declared in a higher scope as well as any new values declared within itself. The following examples shows this in action:

clip_image007

In F#, having two values with the same name doesn’t lead to a compiler error; rather it simply leads to shadowing. When this happens, both values exists in memory, except there is no way to access the previously declared value. For example:

clip_image008

This technique of intentionally shadowing values is useful for giving the illusion of updating values without relying on mutation. Think strings in C#, which is an immutable type that allows reassignment using the same shadowing technique.

Control Flow

You can branch control flow using the if keyword which works exactly like an if statement in C#:

clip_image001[6]

F# supports if-then-else structure, but the thing the sets if statements in F# apart is that if expressions return a value:

clip_image002[7]

F# has some syntactic sugar to help you combat deeply nested if expression with the elif keyword:

clip_image003[6]

Because the result of the if expression is a value, every clause of an if expression must return the same type.

But if you only have a single if and no corresponding else, then the clause must return unit, which is a special type in F# that means essentially “no value”.

Core Types

Besides the primitive types, the F# library includes several core types that will allow you to organize, manipulate and process data:

image

Unit

The unit type is a value signifying nothing of consequence. unit can be thought of as a concrete representation of void and is represented in code via ():

clip_image001[8]

if expressions without a matching else must return unit because if they did return a value, what would happen if else was hit?

Also, in F#, every function must return a value, think method in C# and the void return type, so even if the function doesn’t conceptually return anything then it should return a unit value.

The ignore function can swallow a function’s return value if you want to return unit:

clip_image002[9]