Exercises in Programming Style–Actors

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Following on from the last post, we will look at the Actors style today.

 

Style 28 – Actors

Constraints

  • The larger problem is decomposed into ‘things’ that make sense for the problem domain.
  • Each ‘thing’ has a queue meant for other things to place messages in it.
  • Each ‘thing’ is a capsule of data that exposes only its ability to receive messages via the queue.
  • Each ‘thing’ has its own thread of execution independent of the others.

 

I have been looking forward to doing this style every since the Letterbox style months ago  So without any delay, let’s get started!

F# has built-in support for the Actor Model via the MailboxProcessor type, and we’ll use it to represent each of the ‘things’ in our solution here.

First, we’ll create an actor to load and store the words that we need to process. Since MailboxProcessor is generic, we can define what messages it can receive with a DataStorageMessage union type that allows two kinds of messages:

  • LoadWords – which contains the path to the local file to load words from
  • NextWord – once loaded, this message fetches the next available word

In the snippet below, you can see that we have a recursive receive loop which asynchronously waits for message to arrive in its mailbox and handles them accordingly. One thing to note here is that, if a LoadWords message is received before all the loaded words are processed the current behaviour is to override the remaining words list. This can give you unexpected results, the current design leaves the responsibility of ensuring this doesn’t happen with higher level abstractions (see the controller actor below).

Style28-01

Next, we’ll add another actor to manage the stop words. Similar to the dataStorageManager, we’ll first define the messages that can be sent to our actor:

  • LoadStopWords – which contains the path to the local file to load stop words from
  • IsStopWord – which passes in a word and expects a boolean as reply

As we did in the dataStorageManager, whenever the stopWordsManager actor receives a LoadStopWords message it’ll replace the current list of stop words. All future words passed in via the IsStopWord message will be checked against the updated list of stop words.

Style28-02

Next, we’ll add an actor to track the word frequencies. This actor can accept three messages:

  • Add – adds another word to the current word frequencies
  • TopN – fetches the current top N words with their corresponding frequencies
  • Reset – resets the count

Again, the code snippet below should be pretty straight forward, although there is actually a frailty here due to the use of Seq.take. One thing that often annoys me about Seq.take is that, unlike Enumerable.Take, it throws when there is insufficient number of elements in the sequence! In this case, if the calling agent ask for TopN too early or with a large N it’s possible to kill our actor (which is also something that we’re not handling here). It’s fine given the context of this exercise, but these are things that you need to consider when writing production-ready code.

Style28-03

Lastly, we’ll add an actor to orchestrate the control flow of our program, that will accept a simple Run message. Here, I’ve added an AsyncReplyChannel<unit> to the Run message so that the caller has a deterministic way to know when the program has completed.

When the controller receives a Run message, it’ll initialize the other actors (which happens concurrently due to the asynchronous nature of messaging) and then process all the words from Pride and Prejudice by recursively fetch words from the dataStorageManager until there’s no more. One thing to note is that, because a MailboxProcessor process messages one-at-a-time, so even if the controller receives multiple Run messages at the same time it’ll still process them one at a time and we don’t even have to use locks!

Style28-04

 

To run our program, we’ll kick things off by sending a Run message to the controller. I opted to run this synchronously, but you could just easily ignore the reply and run the program asynchronously with Async.Ignore and Async.Start.

Style28-05

 

I’m a massive fan of Erlang and the Actor Model, they’re highly related since Erlang implements the Actor Model but shouldn’t be mixed up – e.g. code hot swapping and supervision trees are features of Erlang and Erlang OTP, and are not prescribed as part of the Actor Model (which is a theoretical model for describing computation).

If you haven’t already, please go ahead and watch this Channel9 recording of a conversation between Carl Hewitt and Erik Meijer on the Actor Model – what it is, and what it isn’t. I also did a write up to summarise the key points.

 

You can find the source code for this exercise here.

Exercises in Programming Style–Lazy Rivers

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Lazy Rivers style today.

 

Style 27 – Lazy Rivers

Constraints

  • Data comes to functions in streams, rather than as a complete whole all at at once.
  • Functions are filters / transformers from one kind of data stream to another.

 

Given the constraint that data is to come in as streams, the easiest way to model that in F# is using sequences.

First, let’s add a function to read the text from an input file as a seq<char>.

Style27_01

Then, we’ll add another function to transform this sequence of characters into a sequence of words.

Style27_02

Next, we’ll filter this sequence to remove all the stop words and return the non-stop words as another seq<string>.

Style27_03

So far everything’s pretty straightforward, but things get a bit tricky from here on.

To count and sort the non-stop words, we can return the running count as a sequence after each word, but that’s terribly inefficient.

Instead, we can batch the input stream into groups of 5000 words. We’ll update the word frequencies with each batch and produce a new sorted array of word counts for the output sequence.

Style27_04

In the snippet above, focus on the Seq.scan section (ps. Seq.scan is similar to Seq.fold, except it returns all the intermediate values for the accumulator, not just the final accumulated value).

Here, given the current word frequencies map and a batch of words, we’ll return a new map (remember, a F# Map is immutable) whose counts have been updated by the latest batch.

Because Seq.scan returns results for all intermediate steps including the initial value (the empty map in this case), we have to follow up with Seq.skip 1 to exclude the empty map from our output.

Finally, to string everything together, we’ll print the top 25 words for each of the outputs from countAndSort. It takes quite a few iterations, but you’ll see the result slowly emerging in the process.

Style27_05

Unlike the other styles, we’ll get a few sets of outputs – one for each batch of words processed.

 

You can find the source code for this exercise here.

Exercises in Programming Style–Spreadsheet

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Spreadsheet style today.

 

Style 26 – Spreadsheet

Constraints

  • The problem is modeled like a spreadsheet, with columns of data and formulas.
  • Some data depends on other data according to formulas. When data changes, the dependent data also changes automatically.

 

Since we’re modelling the problem like a spreadsheet, let’s first define a few type alias to help us establish our domain language.

  • a column is referenced by a string – e.g. column “A”, “B”, “C”, and so on
  • a column can have explicit values, or a formula
  • a formula references one or more other columns and uses their data to calculate the display value for the column

Style26_01

We can model a Column with a union type, where a column is:

a) given explicit values; or

b) given a formula, and its display value is calculated from that formula.

Style26_02

A Spreadsheet is a collection of columns, whose value can be retrieved or updated using a ColumnRef, e.g.

let spreadsheet = Spreadsheet()

spreadsheet.[“A”] <- Choice1Of2 [| “hello”; “world” |]

let columnA = spreadsheet.[“A”] // [| “hello”; “world” |]

Note that our custom indexer accepts a Choice<DisplayValue, Formula> in its setter instead of a Column. I made this decision because Column.Formula needs both the Formula and its calculated DisplayValue, and here it shouldn’t be the caller’s responsibility to exercise the formula and calculate its DisplayValue.

Style26_03

After a column’s value is updated, we’ll recalculate the DisplayValue of all the columns in the Spreadsheet. This is a naive approach, but sufficient for the task at hand.

During this step, we’ll need to recursively evaluate the DisplayValue of the columns, sometimes a column might be evaluated multiple times if it’s referenced by several formulae. Again, there’s room for optimization here by caching previously calculated values.

Style26_04

 

Next, we’ll start inputting data into our Spreadsheet.

To make it easier to work with the Choice<DisplayValue, Formula> type, let’s add two helper functions.

Style26_05

We’ll leave column A and B blank for now.

Style26_06

Column C will use the data from Column A (all words) and Column B (stop words) to calculate all the non-stop words.

Column D will contain all the distinct words from Column C (non-stop words).

Style26_07

Column E references both Column C (non-stop words) and Column D (distinct non-stop words), and counts the frequency for each of these words.

An important detail to note here is that, positionally, each count in Column E is aligned with the corresponding word in Column D.Style26_08

Column F is where we’ll store the output of the program.

Because Column D (unique non-stop words) and Column E (counts) are positionally aligned, so we can use Array.zip to combine the two columns to calculate a sorted array of word frequencies.

Style26_09

Now that all the columns are set up, we can go back and input the text from Pride and Prejudice and the list of stop words into Column A and B respectively.

Doing so will trigger the DisplayValue of all the other columns to be recalculated.

Style26_10

Finally, we’ll take the top 25 rows from Column F and print them to complete our program.

Style26_11

 

I hope you’ve enjoyed today’s style, apologies for the lack of updates the last two weeks, I have been busy working out some stuff on a personal front. I’ll resume the two posts per week routine, which means we should finish this series in just over a month!

 

You can find the source code for this exer­cise here.

Recording and Slides for F# DSLs talk at F# |> Bristol

Here’s the recorded live stream of the F# DSLs session I did at the F# |> Bristol user group last night.

(apologies for the first few mins where I forgot to share my screen to the Hangout…)

and here’s the slides to go with the talk:

Exercises in Programming Style–Quarantine

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Quarantine style today.

 

Style 24 – Quarantine

Constraints

  • Core program functions have no side effects of any kind, including IO.
  • All IO actions must be contained in computation sequences that are clearly separated from the pure functions.
  • All sequences that have IO must be called from the main program.

 

This style is similar to The One (aka Monads) style we saw earlier on. The notable difference here is that only the side-effecting IO code need to be contained, so we can interpret this as the IO Monad style.

 

Taking inspiration from the computation expression we built for The One style, we can make some modifications here.

We’ll start by declaring a generic IO<‘a> type that will encapsulate an IO action. Then we can build a computation expression around this wrapper type.

Unfortunately do is already a reserved keyword in F#, so we’ll have to make do with do’ instead (see what I did there? )

Style24_01

Now we can add a couple of helper types and methods to wrap around IO operations – such as reading from a file or printing to console – in our IO<‘a> type.

Style24_02

We need to read from files in order to extract words from Pride and Prejudice and to remove stop words, so both functions use our do’ computation expression. Notice that we’re using the aforementioned File.ReadAllText helper method, which returns an IO<string> so our do’ computation expression can bind to.

Style24_03

Other pure functions can stay as they are.

Style24_04

But, our main program will need to be contained since it needs to perform IO operations via the extractWords and removeStopWords functions.

Style24_05

Note that no work has actually been done yet.

So far, we have merely composed together what actions (IO and otherwise) we will perform when this program is run. To complete the puzzle we need one more thing – a mechanism to execute these side-effecting programs.

Something like this IO module perhaps?

Style24_06

which we can use to execute our mainProgram.

Style24_07

and voila!

Now, the IO operations (including printing to the console) will be performed along with other pure functions.

 

You can find the source code for this exer­cise here