LINQ – choosing between Concat() and Union()

In Linq To Objects, there are two ways you can join two sequences together, using either Concat() or Union(), and as I was wondering how the two differs I came across this post:

http://weblogs.asp.net/fbouma/archive/2009/03/04/choose-concat-over-union-if-possible.aspx

The main thing to take away from this article is:

“If you care about the duplicates, Union() is necessary. However, in the case where you can’t have duplicates in the second sequence or you don’t care, Concat() is a better choice.”

Thinking in T-SQL terms:

Concat() = UNION ALL

Union() = UNION

Joining two sequences using Union()

If by some chance you’re looking to join two potentially duplicated lists together, and you don’t want duplicates in the resulting list, see this question on StackOverflow and see Jon Skeet’s answer for a nice and clean way to do this:

http://stackoverflow.com/questions/590991/merging-two-ienumerablets

Learning F# – Part 4

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Tuple

A tuple (pronounced “two-pull”) is an ordered collection of data, and an easy way to group common pieces of data together.

A tuple type is described by a list of the tuple’s elements’ types, separated by asterisks:

clip_image001

You can even have tuples that contain other tuples:

clip_image002

There’s a number of ways to extract values from a tuple, there’s fst (first) and snd (second) functions if you have a two-elements tuple:

clip_image003

And then there’s the let binding:

clip_image004

But remember, you’ll get a compile error if you try to extract too many or too few values from a tuple.

It is possible to pass tuples as parameters to functions:

clip_image005

Lists

Whereas tuples group values into a single entity, lists allow you to link data together to form a chain. Doing so allows you to process list elements in bulk using aggregate operators.

You can declare a list like this:

clip_image006

Notice in the snippet above the empty list had type ‘a list because it could be of any type, therefore it’s generic.

Unlike other languages, F# lists are quite restrictive in how you access and manipulate them – there are only two operations you can perform with a list:

  1. The first is cons, represented by the :: or cons operator. This joins an element to the front or head of a list:

clip_image007

  1. The second is append, uses the @ operator. Append joins two lists together:

clip_image008

List ranges

Declaring list elements as a semicolon-delimited list quickly becomes tedious, especially for large lists. To declare a list of ordered numeric values, use the list range syntax:

clip_image009

If an optional step value is provided, then the result is a list of values in the range between two numbers separated by the stepping value:

clip_image010

List comprehensions

List comprehensions is a rich syntax that allows you to generate lists inline with F# code. The body of the list comprehension will be executed until it terminates, and the list will be made up of elements returned via the yield keyword:

clip_image011

Almost any F# code can exist inside of list comprehensions, including things like function declarations and for loops:

clip_image012

When using loops within list comprehensions, you can simply the code by using -> instead of do yield:

clip_image013

Here’s a more complex example showing how you can use list comprehension to easily find prime numbers:

clip_image014

List module functions

The F# library’s List module contains many methods to help you process lists:

image

The following example demonstrates the List.partition function, partitioning a list of numbers from 1 to 15 into two new lists: one comprised of multiples of five and the other list made up of everything else:

clip_image015

The trick is that List.partition returns a tuple.

Aggregate Operators

Although lists offer a way to chain together pieces of data, there really isn’t anything special about them. The true power of lists lies in aggregate operators, which are a set of power functions that are useful for any collection of values..

List.map

List.map is a projection operation that creates a new list based on a provided function. Each element in the new list is the result of evaluating the function, it has type (‘a -> ‘b) -> ‘a list -> ‘b list

The following example shows the result of mapping a square function to a list of integers:

clip_image016

List.map is one of the most useful functions in the F# language, it provides an elegant way for you to transform data.

List.fold

Folds represent the most powerful type of aggregate operator and not surprisingly the most complicated. When you have a list of values and you want to distil it down to a single piece of data, you use a fold.

There are two main types of folds you can use on lists, first is List.reduce which has type (‘a -> ‘a -> ‘a) -> ‘a list -> ‘a

List.reduce iterates through each element of a list, building up an accumulator value, which is the summary of the processing done on the list so far. Once every list item has been processed, the final accumulator value is returned, the accumulator’s initial value in List.reduce is the first element of the list.

This example demonstrates how to use List.reduce to comma-separate a list of strings:

clip_image001[7]

Whilst useful, reduce fold forces the type of the accumulator to have the same type as the list. If you want to use a custom accumulator type (e.g. reducing a list of items in a shopping cart to a cash value), you can use List.fold.

The fold function takes three parameters:

  1. A function that when provided an accumulator and list element returns a new accumulator.
  2. An initial accumulator value.
  3. The list to fold over.

The return value of the function is the final state of the accumulator. The type of the fold function is:

(‘acc -> ‘b -> ‘acc) -> ‘acc -> ‘b list -> ‘acc

Here’s an example of how you can use it to count the number of vowels in a string:

clip_image002[7]

Folding right-to-left

List.reduce and List.fold process the list in a left-to-right order. There are alternative functions List.reduceBack and List.foldBack for processing lists in right-to-left order.

Depends on what you are trying to do, processing a list in reverse order can have a substantial impact on performance.

List.iter

The final aggregate operator, List.iter, iterates through each element of the list and calls a function that you pass as a parameter, it has type (‘a -> unit) -> ‘a list -> unit

Because List.iter returns unit, it is predominately used for evaluating the side effect of the given method, meaning that executing the function has some side effect other than its return value (e.g. printfn has the side effect of printing to the console in addition to returning unit):

clip_image001[9]

Option

If you want to represent a value that may or may not exist, the best way to do so is to use the option type. The option type has only two possible values: Some(‘a’) and None.

A typical situation you’ll use an option type is when you want to parse a string as an int and if the string is properly formatted you’ll get an int, but if the string is not properly formatted you’ll get None:

clip_image002[9]

A common idiom in C# is to use null to mean the absence of a value. However, null is also used to indicate an uninitialized value, this duality can lead to confusion and bugs. If you use the option type, there is no question what the value represents, similar to how System.Nullable works in C#.

To retrieve the value of an option, you can use Option.get.

clip_image003[7]

One thing to watch out though, is that if you call Option.get on None, an exception will be thrown. To get around this, you can use Option.isSome or Option.isNone to check before the value of the option type before attempting to access it, similar to System.Nullable.HasValue in C#.

Printfn

printfn comes in three main flavours: printf, printfn, and sprintf.

printf takes the input and writes it to the screen, whereas printfn writes it to the screen and adds a line continuation.

pinrtf has formatting and checking built-in (e.g. printfn “%s is %d%c high” mountain height units), it’s also strong typed and uses F#’s type inference system so the compiler will give you an error if the data doesn’t match the given format specifier.

Here’s a table of printf format specifiers:

image

sprintf is used when you want the result of the printing as a string:

clip_image001[11]

Anatomy of an F# Program

Most other languages, like C#, require an explicit program entry point, often called a main method. In F#, for single-file applications, the contents of the code file are execute from top to bottom in order without the need for declaring a specific main method.

For multi-file projects, however, code needs to be divided into organization units called modules or namespaces.

Modules

By default, F# puts all your code into an anonymous module with the same name as the code file with the first letter capitalized. So if you have a value named value1, and your code is in file1.fs, you can refer to it by using the fully qualified path: File1.value1.

You can explicitly name your code’s module by using the module keyword at the top of a code file:

clip_image001[13]

Files can contain nested modules as well. To declare a nested module, use the module keyword followed by the name of your module and an equals sign =. Nested modules must be indented to be disambiguated from the “top-level” module:

clip_image002[11]

Namespaces

The alternative to modules is namespaces. Namespaces are a unit of organizing code just like modules with the only difference being that namespaces cannot obtain value, only type declarations.

Also, namespaces cannot be nested in the same way that modules can, instead, you can add multiple namespaces to the same file:

clip_image001[15]

It may seem strange to have both namespaces and modules in F#. Modules are optimized for rapid prototyping and quickly exploring a solution, as you have seen so far. Namespaces, on the other hand, are geared toward larger-scale projects with an object-oriented solution.

Program Startup

For single file projects, the code will be executed from top to bottom, however, when a you add a new file to the project, the newly added file will be run when the program starts up.

For more formal program-startup semantics, you can use the [<EntryPoint>] attribute to define a main method. To qualify, your method must:

  • Be the last function defined in the last compiled file in your project.
  • Take a single parameter of type string array, which are the arguments to your program.
  • Return an integer, which is your program’s exit code.

Learning F# – Part 3

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Functions

You define functions the same way you define values, except everything after the name of the function servers as the function’s parameters. The following defines a function called square that takes an integer, x, and returns its square:

clip_image001

Unlike C#, F# has no return keyword. The last expression to be evaluated in the function determines the return type.

Also, from the FSI output above, it shows the function square has signature int -> int, which reads as “a function taking an integer and returning an integer”.

Type Inference

Take this add function for example:

clip_image002

Looking at this you might be wondering why does the compiler think that the add function only takes integers? The + operator also works on floats too!

The reason is type inference. Unlike C#, F# doesn’t require you to explicitly state the types of all the parameters to a function, it figures it out based on usage. Because the + operator works for many different types such as byte, int, and decimal, the compiler simply defaults to int if there is no additional information.

The following FSI snippet shows what type inference in action if we not only define the add function but also call it passing in floats, then the function’s signature will be inferred to be of type float -> float -> float instead:

clip_image003

However, you can provide a type annotation, or hint, to the F# compiler about what the types are. To do this, simply replace a function parameter with the following form ident -> (ident: type) like this:

clip_image004

This works because the only overload for + that takes a float as its first parameter is float -> float -> float, so the F# compiler infers y to be a float as well.

Type inference can reduce code clutter by having the compiler figure out what types to use, but the occasional type annotation is required and can sometimes improve code readability.

Generic Functions

You can write functions that work for any type of a parameter, such as an identity function below:

clip_image005

Because the type inference system could not determine a fixed type for value x in the ident function, it was generic. If a parameter is generic, then that parameter can be of any type.

The type of a generic parameter can have the name of any valid identifier prefixed with an apostrophe, but typically letters of the alphabet starting with ‘a’ as you can see from the FSI snippet for the ident function above.

Writing generic code is important for maximizing code reuse.

Scope

Every value declared in F# has a specific scope, more formally referred to as a declaration space.

The default scope is module scope, meaning variables can be used anywhere after their declaration. However, values defined within a function are scoped only to that function.

For example:

clip_image006

The scoping of a variable is important because F# supports nested functions – i.e. you can declare new function values within the body of a function. Nested functions have access to any value declared in a higher scope as well as any new values declared within itself. The following examples shows this in action:

clip_image007

In F#, having two values with the same name doesn’t lead to a compiler error; rather it simply leads to shadowing. When this happens, both values exists in memory, except there is no way to access the previously declared value. For example:

clip_image008

This technique of intentionally shadowing values is useful for giving the illusion of updating values without relying on mutation. Think strings in C#, which is an immutable type that allows reassignment using the same shadowing technique.

Control Flow

You can branch control flow using the if keyword which works exactly like an if statement in C#:

clip_image001[6]

F# supports if-then-else structure, but the thing the sets if statements in F# apart is that if expressions return a value:

clip_image002[7]

F# has some syntactic sugar to help you combat deeply nested if expression with the elif keyword:

clip_image003[6]

Because the result of the if expression is a value, every clause of an if expression must return the same type.

But if you only have a single if and no corresponding else, then the clause must return unit, which is a special type in F# that means essentially “no value”.

Core Types

Besides the primitive types, the F# library includes several core types that will allow you to organize, manipulate and process data:

image

Unit

The unit type is a value signifying nothing of consequence. unit can be thought of as a concrete representation of void and is represented in code via ():

clip_image001[8]

if expressions without a matching else must return unit because if they did return a value, what would happen if else was hit?

Also, in F#, every function must return a value, think method in C# and the void return type, so even if the function doesn’t conceptually return anything then it should return a unit value.

The ignore function can swallow a function’s return value if you want to return unit:

clip_image002[9]

Learning F# – Part 2

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Primitive Types

F# is statically typed, meaning that type checking is done at compile time.

F# supports the full set of primitive .Net types which are built into the F# language and separate from user-defined types.

Here’s a table of all the numeric types (both integer and floating-point) with their suffixes:

image

F# also allows you to specify values in hexadecimal (base 16), octal (base 8 ) or binary (base 2) using prefix 0x, 0o, or 0b:

clip_image001

There are no implicit type conversion in F#, which eliminates subtle bugs introduced by implicit type conversion as can be found in other languages.

Arithmetic Operators

You can use standard arithmetic operators on numeric primitives, like other CLR-based languages, integer division rounds down to the next lowest number discarding the remainder. Here’s a table of all supported operators:

image

A very important to note here is that by default, these arithmetic operators do not check for overflow! If a number becomes too big for its type it’ll overflow to be negative, and vice versa:

clip_image002

F# also features all the standard math functions, here’s a table of the common math functions:

image

BigInt

If you are dealing with data larger than 2^64, F# has the BigInt type for representing arbitrarily large integers. While the BigInt type is simply an alias for the System.Numerics.BigInteger type, it’s worth noting that neither C# nor VB.Net has syntax to support arbitrarily large integers.

BigInt uses the I suffix for literals, see example below:

clip_image003

You should remember that although BigInt is heavily optimized, it is still much slower than using the primitive integer types.

Bitwise Operations

Primitive integer types support bitwise operators for manipulating values at a binary level:

image

Characters

The .Net platform is based on Unicode, so characters are represented using 2-byte UTF-16 characters. To define a character value, you can put any Unicode character in single quotes, for example:

clip_image004

Like C#, to represent special control characters you need to use an escape sequence from the table below:

image

You can get the byte value of a character literal by adding a B suffix:

clip_image005

Strings

String literals are defined by enclosing a series of characters in double quotes which can span multiple lines. To access a character from within a string, use the indexer syntax, .[ ], and pass in a zero-based character index. For example:

clip_image006

If you want to specify a long string, you can break it up across multiple lines using a single backslash, \, for example:

clip_image007

Like in C#, you can define a verbatim string using the @ symbol, which ignores any escape sequence characters:

clip_image008

Boolean Values

F# has the bool type (System.Boolean) as well as standard Boolean operators listed below:

image

F# uses short-circuit evaluation when evaluating Boolean expressions, meaning that if a result can be determined after evaluating the first of the two expressions, the second value won’t be evaluated. For example:

true || f() – will evaluate to true without executing function f.

false && g() – will evaluate to false without executing function g.

Comparison and Equality

You can compare numeric values using standard operators listed below:

image

All these operators evaluate to a Boolean value except the compare function which returns -1, 0, or 1 depending on whether the first parameter is less than, equal to, or greater than the second.

You should have noticed that these operators are similar to those found in SQL Server and F# doesn’t distinguish assignment from equality (like C#, where = is assignment and == is equality comparison).

When it comes to equality, as in other CLR-based languages, it can mean different things – value equality or referential equality. For value types, equality means the values are identical. For reference types, equality is determined by overriding the System.Object method Equals.

Learning F# – Part 1

I decided to take some time out on Silverlight and have a play around with Microsoft’s new language F#, I’ll be making notes as I go along and post them here so maybe they’d help satisfy some of your curiosity too!

Disclaimer: I do not claim credit for the code examples and much of the contents here, these are mostly extracts from the book by Chris Smith, Programming F#: A comprehensive guide for writing simple code to solve complex problems. In fact, if you’re thinking of learning F# and like what you read here, you should buy the book yourself, it’s easy to read and the author has gone go great lengths to keep things simple and included a lot of code examples for you to try out yourself.

Before you start, you need to download F# from:

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/release.aspx

and install it for either VS2008 or VS2010 beta.

Hello, World

In VS2008, open a new project and choose F# Application, in Program.fs type in:

printfn “Hello, World”

Hit Ctrl+F5, and voila:

You might notice that your program worked without an explicit Main method.

Here’s a slightly more complex Hello, World to show off the syntax of F#:

image

The let keyword binds a name to a value, unlike most other programming languages, in F#, values are immutable by default (meaning they cannot be changed once initialized).

image

F# is also case-sensitive. The name of a variable can contain letters, numbers, underscore, or apostrophe (‘) and must begin with a letter or an underscore. However, you can enclose the value’s name with a pair of tickmarks in which case the name can contain any character except for tabs and newline:

let “this.Isn’t %A% good value Name$!#“ = DateTime.Now.ToString(“hh:mm tt”)

printfn “%s, %s at %s” greeting thing “this.Isn’t %A% good value Name$!#“

Other languages like C# use semicolons and curly braces to indicate when statements and blocks of code are complete, but it clutters the code.

In F#, whitespace (spaces and newlines) is significant. The F# compiler allows you to use whitespace to delimit code blocks. For example, anything indented more than the if keyword is considered to be in the body of the if statement. Because tab characters can indicate an unknown number of space characters, they are prohibited in F# code. You can configure the Visual Studio editor to automatically convert tab characters into spaces in Tools -> Options -> Text Editor -> F#.

The earlier code also demonstrated how F# can interoperate with existing .Net libraries:

let timeOfDay = DateTime.Now.ToString(“hh:mm tt”)

F# can take advantage of any .Net library by calling directly into it, conversely any code written in F# can be consumed by other .Net languages.

Comments

Like any language, F# allows you to comment your code. To declare a single line comment, use two slashes (//).

For larger comments that span multiple lines, you can use the multiline comments using (* and *) characters.

For F# applications written in Visual Studio, there is a third type of comments: an XML documentation comment. If a comment starting with three slashes (///) is placed above an identifier, Visual Studio will display the comment’s text when you hover over it:

 


F# Interactive

Visual Studio comes with a tool called F# Interactive or FSI. F# Interactive is a tool known as a REPL (read-evaluate-print loop), it accepts F# code, compiles and executes it, then prints the results. This allows you to quickly and easily experiment with F# code similar to how snippet editor allows you to do the same with C# code.

Once it’s open, it accepts F# code until you terminate the input with ;; and a newline, the code entered will be compiled and executed. After each code snippet is sent to FSI, for every name introduced you will see val <name> : type value, for example:

Notice when you don’t give a variable a name, FSI will simply call it ‘it‘.

FSI allows you to write code in Visual Studio editor which offers syntax highlighting and IntelliSense, but test your code in the FSI window. You can copy the entire code body from the example earlier and test the main method in FSI by calling it:

Managing F# Source Files

The F# language has some unique characteristics when it comes to managing projects with multiple source files. In F#, the order in which code files are compiled is significant.

You can only call into functions and classes defined earlier in the code file or in a separate code file compiled before the file where the function or class is used. If you rearrange the order of the source files, your program may no longer build!

The reason for this significance in compilation order is type inference, a topic to be covered later on.

F# source files are compiled in the order they are displayed in Visual Studio’s Solution Explorer, from top to bottom. You can rearrange the files by right-clicking and selecting Move Up or Move Down. The equivalent keyboard shortcuts are Alt+Up and Alt+Down, though these don’t seem to be working in VS2008 with Resharper, or maybe I just need to tweak my key mapping.