Exercises in Programming Style–Cookbook

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.


Fol­low­ing on from the last post, we will look at the Cook­book style today.


Style 4 – Cookbook

Although Crista has called this the Cook­book style, you’d prob­a­bly know it as Pro­ce­dural Pro­gram­ming.


  • No long jumps
  • Com­plex­ity of con­trol flow tamed by divid­ing the large prob­lem into smaller units using pro­ce­dural abstraction
  • Pro­ce­dures may share state in the form of global variables
  • The large prob­lem is solved by apply­ing the pro­ce­dures, one after the other, that change, or add to, the shared state


As stated in the con­straints, we need to solve the term fre­quen­cies prob­lem by build­ing up a sequence of steps (i.e . pro­ce­dures) that mod­i­fies some shared states along the way.

So first, we’ll define the shared states we’ll use to:

  • hold the raw data that are read from the input file
  • hold the words that will be con­sid­ered for term frequencies
  • the asso­ci­ated fre­quency for each word




I fol­lowed the basic struc­ture that Crista laid out in her solution:

  1. read_file : reads entire con­tent of the file into the global vari­able data;
  2. filter_chars_and_normalize : replaces all non-alphanumeric char­ac­ters in data with white space’’;
  3. scan : scans the data for words, and adds them to the global vari­able words;
  4. remove_stop_words : load the list of stop words; appends the list with single-letter words; removes all stop words from the global vari­able words;
  5. fre­quen­cies : cre­ates a list of pairs asso­ci­at­ing words with frequencies;
  6. sort : sorts the con­tents of the global vari­able word­Freqs by decreas­ing order of frequency



As you can see, read­File is really straight for­ward. I cho­sen to store the con­tent of the file as a char[] rather than a string because it sim­pli­fies fil­ter­CharsAnd­Nor­mal­ize:

  • it allows me to use Char.IsLetterOrDigit
  • it gives me muta­bil­ity (I can replace the indi­vid­ual char­ac­ters in place)


The down­side of stor­ing data as a char[] is that I will then need to con­struct a string from it when it comes to split­ting it up by white space.


To remove the stop words, I loaded the stop words into a Set because it pro­vides a more effi­cient lookup.


For the fre­quen­cies pro­ce­dure,  I would have liked to sim­ply return the term fre­quen­cies as out­put, but as the con­straint says that

The large prob­lem is solved by apply­ing the pro­ce­dures, one after the other, that change, or add to, the shared state

so unfor­tu­nately we’ll have to update the word­Freqs global vari­able instead…


And finally, sort is straight for­ward thanks to Array.sortInPlaceBy:



To tie every­thing together and get the out­put, we sim­ply exe­cute the pro­ce­dures one after another and then print out the first 25 ele­ments of word­Freqs at the end.




Since each pro­ce­dure is mod­i­fy­ing shared states, this intro­duces tem­po­ral depen­dency between the pro­ce­dures. As a result:

  • they’re not idem­po­tent – run­ning a pro­ce­dure twice will cause different/invalid results
  • they’re not iso­lated in mind­scape – you can’t think about one pro­ce­dure with­out also think­ing about what the pre­vi­ous pro­ce­dure has done to shared state and what the next pro­ce­dure expects from the shared state

Also, since the expec­ta­tion and con­straints of the pro­ce­dures are only cap­tured implic­itly in its exe­cu­tion logic, you can­not lever­age the type sys­tem to help com­mu­ni­cate and enforce them.

That said, many sys­tems we build nowa­days can be described as pro­ce­dural in essence – most web ser­vices for instance – and rely on shar­ing and chang­ing global states that are prox­ied through data­bases or cache.


You can find all the source code for this exer­cise here.

Exercises in Programming Style–Monolith

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.


Fol­low­ing on from the last post, we will look at the Mono­lith style today.


Style 3 – Monolith

As the name sug­gests, this style is typ­i­fied by the use of one big blob of code that con­tains all our appli­ca­tion logic!


  • No named abstrac­tions (i.e. no methods/functions)
  • No, or lit­tle, use of libraries


The first con­straint meant that we can’t break up the logic into smaller func­tions, hence why every­thing is inside one big for loop.

I took the sec­ond con­straint to mean that we should limit the num­ber of name­spaces we open (as is the case in Crista’s ver­sion). Which means a Dic­tio­nary is out of the ques­tion. I could still use a Map but that feels too much of a cheat so I decided to stick with an array of string * int tuple to keep track of the word frequencies.


First we need to load the list of stop words:


Then we read the text line by line and process it in one giant for loop…


One caveat I ran into whilst work­ing on this, is with the behav­iour of File.ReadAllLines. It removes the line end­ing char­ac­ter for each line, which means whilst look­ing for words in a line we have two cases:

  1. words in the mid­dle of a line, e.g. “I spoke to Eliz­a­beth today…”
  2. words at the end of a line, e.g. “…turned around to Eliz­a­beth

so to make the logic con­sis­tent, and remove the spe­cial case (case 2) I decide to just add the new­line char­ac­ter at the end of the string we need to process.


Other than the above caveat, this code is pretty sim­ple. I decided to devi­ate from Crista’s imple­men­ta­tion in another place – her solu­tion reorders the word_freqs array when­ever an exist­ing word count is updated. But I fig­ure it’s likely more effi­cient to just do it once after the main loop.



side­bar: I have used higher-order func­tions in a few places, which I’m not sure if they con­sti­tute as cheat­ing. How­ever, I don’t con­sider them vio­la­tions to the con­straints we have set our­selves for this par­tic­u­lar exercise.



Code writ­ten in this style is gen­er­ally hard to fol­low as you need to keep the entire appli­ca­tion logic (i.e. the big for loop) in your head whilst try­ing to under­stand its behav­iour, and we know our brains are not great at hold­ing many items in the active mem­ory (the famous 7 +/- 2 rule).


One thing to note is that, whilst this style was com­mon­place with low-level lan­guages (and goto was pop­u­lar back then too), it’s pos­si­ble to pro­gram in this style with high-level lan­guages too (as we’ve done with F# here).

I also find Node.js style call­backs tend to lend them­selves to this style of cod­ing too.


You can find all the source code for this exer­cise here.

Exercises in Programming Style–Go Forth

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.


Fol­low­ing on from the last post, we will look at the Go Forth style today.


Style 2 – Go Forth

This style is named after the Forth pro­gram­ming lan­guage, which is a stack-oriented. Another pop­u­lar stack-oriented lan­guage is Fac­tor, which Bruce Tate cov­ered in Seven More Lan­guages in Seven Weeks.



  • Exis­tence of a data stack. All oper­a­tions are done over data on this stack.
  • Exis­tence of a heap for stor­ing data that’s needed for later oper­a­tions. The heap data can be asso­ci­ated with names (i.e. vari­ables). Since all oper­a­tions are done over data on the stack, heap data needs to be moved first to the stack and even­tu­ally back to the heap.
  • Abstrac­tion in the form of user-defined “pro­ce­dures” (i.e. names bound to a set of instruc­tions), which may be called some­thing else entirely.


Admit­tedly this is another dif­fi­cult style to pro­gram in, where even sim­ple things such as adding two num­bers together requires:

  1. push­ing the two num­bers onto the stack
  2. per­form addi­tion against the two items
  3. push­ing the result onto the stack


You might have also known this kind of arith­metic oper­a­tions as Reverse Pol­ish Nota­tion.

You may also be inter­ested to know that the CIL (you know, the inter­me­di­ate lan­guage your C#/F# code gets com­piled into) is also stack-based and works in this way.


I’m not sat­is­fied with Crista’s solu­tion in her book, where she went out of style many times (you can see by the num­ber of “out of style, left for exer­cise” com­ments in the code). In a way, it’s per­haps a reflec­tion of the dif­fi­culty in cod­ing in this style, espe­cially when that works against the idioms of the lan­guage you’re pro­gram­ming in.

Also, she has oper­ated against data on the heap directly on sev­eral occa­sions, which also vio­lated our constraints.

Now, it’s easy to crit­i­cise these vio­la­tions, but I get it, I wanted to cheat so many times through­out – just get some­thing from the heap, apply func­tion over it, and make life eas­ier for myself.

To help fight the temp­ta­tions to break style, I made some mod­i­fi­ca­tions to Crista’s imple­men­ta­tion to bet­ter enforce the con­straints we have placed upon ourselves.


Get­ting Started

Before we start solv­ing the term fre­quen­cies prob­lem, let’s first setup the envi­ron­ment we’re gonna be work­ing in which enforces the afore­men­tioned constraints.

We know we need a stack and a heap, as well as the abil­ity to get data to and from both.


Cou­ple of things to note from the above:

  • both stack and heap are private;
  • pop is also private;
  • load doesn’t return the asso­ci­ated value, but pushes it on the stack straight away.

These design deci­sions stem from my attempt to make it harder for me to break style later on.

  • stack and heap are pri­vate so I can’t cheat by manip­u­lat­ing them directly;
  • pop is pri­vate so that I can’t just pop a value off the stack and then per­form an oper­a­tion with another oper­ant that wasn’t on the stack, e.g.

let x = pop<int>()

let y = add x 42    // 42 is not on the stack!

// and don’t for­get to push the result back onto the stack!

  • load doesn’t return the value directly to enforce the con­straint that data on the heap has to be copied to the stack before you can per­form any oper­a­tion on them.


Since we can’t pop items off the stack out­side of this mod­ule, we’ll have to give con­sumers of this mod­ule some other way to per­form oper­a­tions against data on the stack.

Here we have a cou­ple of func­tions that takes in an oper­a­tion (i.e. a func­tion) and per­forms it against the stack.


Notice that there are vari­ants for unary and binary oper­a­tions, this is to cater for cases such as “pop item off the stack and move it back to the heap”.

Whilst this is not a cheat-proof* way to force you to per­form all oper­a­tions against the stack, it proved a suf­fi­cient deter­rent for me dur­ing this exercise.

* for instance, you can still pass in a par­tially applied func­tion such as ((+) 42)


Finally, to help with debug­ging, I added two func­tions to print the cur­rent con­tent of the stack and heap:



Defin­ing the Procedures

Next, let’s imple­ment the “pro­ce­dures” using the basic struc­ture Crista outlined:

  1. read_file – take data on the stack (file path) and read the con­tent of the file onto the stack in its place;
  2. filter_chars – take data on the stack (con­tent of the file) and replace all non-alphanumeric char­ac­ters with space; put the new string back onto the stack;
  3. scan – take data on the stack and split the long string with space and put the indi­vid­ual words back onto the stack;
  4. remove_stop_words – process data on the stack (the indi­vid­ual words) and fil­ter out the stop words; the rest are put onto the heap
  5. fre­quen­cies – move the fil­tered words from heap back onto the stack and process them; keep­ing track of the fre­quency for each word with a map in the heap
  6. sort – sort the word fre­quen­cies in heap

Since all these pro­ce­dures have to oper­ate against the stack, so they don’t need to take in any argu­ments nor do they need to return anything.

In addi­tion to the above, I also added a helper pro­ce­dure to split a string since it’s some­thing I needed on more than one occasion.

Of course, to stay in style we have to put the sep­a­ra­tor as well as the string we want to split on the stack and oper­ate on them via a binary oper­a­tion that takes both as arguments.


Even then,the use of the String.Split method might still be con­sid­ered too high-level for this style.



This pro­ce­dure is really straight for­ward. It assumes the head of the stack is the file path.


After the pro­ce­dure call, the stack would con­tain the con­tent of the file:




I dis­liked the name of this pro­ce­dure as it was specif­i­cally fil­ter­ing out non-alphanumeric char­ac­ters, which isn’t reflected in its name. So I renamed the pro­ce­dure to be more spe­cific, but the approach is the same:


This pro­ce­dure assumes the head of the stack con­tains the input that needs to be fil­tered. As Crista men­tioned in her imple­men­ta­tion, using a Regex is kinda cheat­ing too as it’s too high level for this style.




Now that the fil­tered string is at the head of the stack, we can push the space char­ac­ter to the stack and then use the afore­men­tioned split pro­ce­dure to split the fil­tered string with space.

The result­ing array of strings (sit­ting at the head of the stack at this point) would then need to be pushed onto the stack as indi­vid­ual items.






First, we need to get the list of stop words from file. To do that, we push the file path to the stack; read it; then split the con­tent of the file (a comma-separated list of lower-case strings).

Next, we need to move it to the heap for later use.

At this point, our stack con­tains just the indi­vid­ual words from the scan pro­ce­dure.


For each of the words on the stack, we need to:

  • check if it’s listed as a stop word, and that it’s at least 2 characters;
  • for words that pass our fil­ter, we’ll add it to the list of valid words in the heap;
  • once all the words have been processed, then we load the list of valid words from heap and add them back onto the stack.

Since we can’t use data on the heap directly, so we had to push the rel­e­vant data onto the stack before run­ning oper­a­tions against it.

Also, notice in here, we didn’t con­cate­nate the list and store the new list in the heap in one go:


This is again, because cons ( :: ) is itself an oper­a­tion that should only be per­formed against the stack.



At the end of the remove_stop_words the stack con­tains all the words that need to be counted.


For this task, we’ll store a dic­tio­nary for track­ing the count for each word in the heap. As we process each word from the stack, we’ll also need to move the dic­tio­nary onto the stack so we can oper­ate on it.

You may notice that I have cheated here:

  • I per­formed dic­tio­nary look up and addi­tion in one operation
  • using a dic­tio­nary is prob­a­bly too high-level for this style too

A more in-style approach might look like the below:



At the end of this pro­ce­dure the stack will be empty, and the heap would con­tain a “word_freqs” dic­tio­nary with the term fre­quency for all the words.



Here I’m mak­ing use of LINQ to sort the word fre­quen­cies, and each step along the way pops the head item off the stack and pushes the result in its place. Hence why Order, Take and Reverse are per­formed as sep­a­rate unary operations.


Also, note that we needed to reverse the word­Freqs sequence to ensure that words with higher fre­quency are popped off the stack first.


Once we have run all the above pro­ce­dures, we will end up with a sequence of key value pairs of word and asso­ci­ated count. All that’s left to do is to process them in turn and print them out:


and voila!



As with Good Old Times, cod­ing in this style makes things so much more com­plex com­pared to what one would do in idiomatic F#.

Whilst Good Old Times forces you to rely on doc­u­men­ta­tion to remem­ber what each cell in the pri­mary mem­ory refers to, the pres­ence of a heap in Go Forth eases that burden.

How­ever, the pres­ence of the stack and the require­ment that all oper­a­tions much be per­formed against the stack intro­duces other pain points. For instance:

  • even sim­ple oper­a­tions become convoluted;
  • hav­ing to per­form addi­tional work to move data to and from heap;
  • pro­ce­dures have to make assump­tions about the state of the stack;
  • expec­ta­tions of the pro­ce­dures can­not be expressed in their sig­na­ture since they have no dis­tin­guish­able signature.

As func­tional pro­gram­mers, we like to (where pos­si­ble, and to the extend pos­si­ble with our lan­guage of choice) use the type sys­tem to con­struct a kinda math­e­mat­i­cal proof that our appli­ca­tion will behave cor­rectly. This is what we refer to as Type Dri­ven Devel­op­ment, which of course is not pos­si­ble with this stack-based approach.


You can find all the source code for this exer­cise here.

F# – Monty Hall problem

Do you remem­ber that scene from the movie 21 where Kevin Spacey gave our pro­tag­o­nist Ben the game show host problem?

What Kevin Spacey described is known as the Monty Hall prob­lem. It became famous as a ques­tion from a reader’s let­ter to Mar­i­lyn vos Savant’s “Ask Mar­i­lyn” col­umn in Parade mag­a­zine in 1990.

Marilyn’s answer caused quite a bit of con­tro­versy at the time and whilst I don’t under­stand the for­mal math­e­mat­i­cal proof, it’s a prob­lem that can be eas­ily proved through simulations.


Sim­u­la­tion with F#

First, let’s set up a few types to rep­re­sent the prob­lem domain:


Pretty self-explanatory here:

  • behind every Door is a Prize;
  • the Prize is either a Car or a Goat;
  • you Win if you get the Car in the end, oth­er­wise you Lose

When you start a new game you always have three doors, arranged in some ran­dom order.


After you had given your ini­tial choice, the game show host will reveal one of the doors that has a Goat?.

Here we have a func­tion that takes in the doors and the player’s ini­tial choice; and return the index of a door that:

  1. the player didn’t choose; and
  2. has a goat


Putting it together, we have a func­tion that takes in the doors as well as the player’s strat­egy and return whether the player Win or Lose at the end of the game.


From the code above you can guess the type sig­na­ture of the strat­egy func­tion to be int –> int –> int. That is, it takes two inte­gers – the player’s ini­tial choice and the index of the door the host has revealed – and return the player’s final choice.

Now, given the two choices we have – to stay or to change, we can rep­re­sent it with two strategies:

  • strat­e­gyA would stay with the player’s orig­i­nal choice;
  • strat­e­gyB would change


We can now use these two func­tions to call play with.

p.s. notice we have essen­tially imple­mented the Strat­egy pat­tern, but with­out all the boil­er­plates and inter­faces and classes? With just func­tions! Isn’t that sweet?


To be kinder to our­selves, let’s add another helper func­tion that will take in our cho­sen strat­egy and run a sim­u­la­tion of 1000 games, and return the num­ber of games that we have won with this strategy:


Finally, let’s run our sim­u­la­tions and see how the two strate­gies perform.


Run­ning this sim­u­la­tion over and over, strat­e­gyB con­sis­tently out­per­forms strat­e­gyA by roughly 2-to-1, which is exactly what we expected.

You can also just run the sim­u­la­tion your­self on .Net Fid­dle here, just click the Run but­ton at the top.


Improv­ing our code

Whilst the above is suf­fi­cient for its pur­pose and it’s a pretty short solu­tion, there’re cou­ple of things that can be improved in hindsight:

  • player’s choice – to Stay, or Change – is not explic­itly rep­re­sented in the domain;
  • whilst a Change can only lead to one out­come – chang­ing to the door that isn’t revealed nor ini­tially cho­sen – it is expressed implic­itly by the logic in strat­e­gyB;
  • if we were to intro­duce another strat­egy (e.g. decide to stay or change ran­domly) then we’d have to dupli­cate the logic for change.

To make these improve­ments is really easy. We can start by adding a type to present the two choices avail­able to the player:


Then we can update the play func­tion to work with it:


So now, our strat­egy imple­men­ta­tions become much simpler:


Again, you can try out this updated ver­sion on .Net Fid­dle here. It also includes a ‘ran­dom’ strat­egy which chooses to Stay or Change using the rand func­tion we cre­ated earlier.


There are other things we can improve still. For instance, how do we encap­su­late the con­straint of hav­ing only 3 doors as part of our model?

This way, we can elim­i­nate another invalid state using the type sys­tem. I’ll leave that as a teaser to you  and feel free to share your solu­tion in the comments!

Exercises in Programming Style–Style 1

NOTE : read the rest of the series, or check out the source code.



I was at Joy of Cod­ing ear­lier this year and one of the high­light for me was Crista Lopes’ keynote  Exer­cises in Pro­gram­ming Style.

Crista demon­strated how a sim­ple prob­lem of cal­cu­lat­ing term fre­quency can be writ­ten in a plethora of ways, including:

The point of these exer­cises is to allow you to see that there are many solu­tions to the same prob­lem, and that each comes with a set of con­straints that needs to be com­mu­ni­cated.


I really enjoyed the talk and bought Crista’s book so I can go through all 33 approaches in my own time!


It’s taken me a lit­tle while to find the time to do this, but I was finally able to make a start last week­end. Over the next cou­ple of weeks I hope to share with you my ports of Crista’s Python solu­tions in F# and my takeaways.

If you like what you see then please buy the book and sup­port Crista’s work!


Style 1 – Good Old Times

This style was com­mon­place in early 50s when hard­ware lim­i­ta­tions had some hard constraints.


  • Very small amount of pri­mary mem­ory, typ­i­cally orders of mag­ni­tude smaller than the data that needs to be processed/generated.
  • No iden­ti­fiers – i.e. no vari­able names or tagged mem­ory addresses. All we have is mem­ory that is address­able with numbers.


Get the source code here.

As you can expect, not being able to use iden­ti­fiers really stops you in your tracks and made an oth­er­wise sim­ple task so much more com­plex (also makes you appre­ci­ate the work of lan­guage and hard­ware design­ers that came before us).

The imple­men­ta­tion feels deeply unnat­ural due to the con­straints we have imposed on our­selves, and that’s kinda the point of these exercises.

The absence of iden­ti­fiers for instance, forces us to lump a bunch of global vari­ables into an array that acts as our ‘pri­mary mem­ory’ for the pro­gram and forces you to remem­ber which index cor­re­sponds to what.

This style is really at odds with how one would code in F# (prob­a­bly even more so than Python!), as by design F# nudges you towards code in a func­tional style — using immutable val­ues, recur­sion, using types to drive your pro­gram, etc.