Exercises in Programming Style–Go Forth

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cis­es then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Go Forth style today.

 

Style 2 – Go Forth

This style is named after the Forth pro­gram­ming lan­guage, which is a stack-ori­ent­ed. Anoth­er pop­u­lar stack-ori­ent­ed lan­guage is Fac­tor, which Bruce Tate cov­ered in Sev­en More Lan­guages in Sev­en Weeks.

image

Constraints

  • Exis­tence of a data stack. All oper­a­tions are done over data on this stack.
  • Exis­tence of a heap for stor­ing data that’s need­ed for lat­er oper­a­tions. The heap data can be asso­ci­at­ed with names (i.e. vari­ables). Since all oper­a­tions are done over data on the stack, heap data needs to be moved first to the stack and even­tu­al­ly back to the heap.
  • Abstrac­tion in the form of user-defined “pro­ce­dures” (i.e. names bound to a set of instruc­tions), which may be called some­thing else entire­ly.

 

Admit­ted­ly this is anoth­er dif­fi­cult style to pro­gram in, where even sim­ple things such as adding two num­bers togeth­er requires:

  1. push­ing the two num­bers onto the stack
  2. per­form addi­tion against the two items
  3. push­ing the result onto the stack

image

You might have also known this kind of arith­metic oper­a­tions as Reverse Pol­ish Nota­tion.

You may also be inter­est­ed to know that the CIL (you know, the inter­me­di­ate lan­guage your C#/F# code gets com­piled into) is also stack-based and works in this way.

 

I’m not sat­is­fied with Crista’s solu­tion in her book, where she went out of style many times (you can see by the num­ber of “out of style, left for exer­cise” com­ments in the code). In a way, it’s per­haps a reflec­tion of the dif­fi­cul­ty in cod­ing in this style, espe­cial­ly when that works against the idioms of the lan­guage you’re pro­gram­ming in.

Also, she has oper­at­ed against data on the heap direct­ly on sev­er­al occa­sions, which also vio­lat­ed our con­straints.

Now, it’s easy to crit­i­cise these vio­la­tions, but I get it, I want­ed to cheat so many times through­out – just get some­thing from the heap, apply func­tion over it, and make life eas­i­er for myself.

To help fight the temp­ta­tions to break style, I made some mod­i­fi­ca­tions to Crista’s imple­men­ta­tion to bet­ter enforce the con­straints we have placed upon our­selves.

 

Getting Started

Before we start solv­ing the term fre­quen­cies prob­lem, let’s first set­up the envi­ron­ment we’re gonna be work­ing in which enforces the afore­men­tioned con­straints.

We know we need a stack and a heap, as well as the abil­i­ty to get data to and from both.

image

Cou­ple of things to note from the above:

  • both stack and heap are pri­vate;
  • pop is also pri­vate;
  • load doesn’t return the asso­ci­at­ed val­ue, but push­es it on the stack straight away.

These design deci­sions stem from my attempt to make it hard­er for me to break style lat­er on.

  • stack and heap are pri­vate so I can’t cheat by manip­u­lat­ing them direct­ly;
  • pop is pri­vate so that I can’t just pop a val­ue off the stack and then per­form an oper­a­tion with anoth­er oper­ant that wasn’t on the stack, e.g.

let x = pop<int>()

let y = add x 42    // 42 is not on the stack!

// and don’t for­get to push the result back onto the stack!

  • load doesn’t return the val­ue direct­ly to enforce the con­straint that data on the heap has to be copied to the stack before you can per­form any oper­a­tion on them.

 

Since we can’t pop items off the stack out­side of this mod­ule, we’ll have to give con­sumers of this mod­ule some oth­er way to per­form oper­a­tions against data on the stack.

Here we have a cou­ple of func­tions that takes in an oper­a­tion (i.e. a func­tion) and per­forms it against the stack.

image

Notice that there are vari­ants for unary and bina­ry oper­a­tions, this is to cater for cas­es such as “pop item off the stack and move it back to the heap”.

Whilst this is not a cheat-proof* way to force you to per­form all oper­a­tions against the stack, it proved a suf­fi­cient deter­rent for me dur­ing this exer­cise.

* for instance, you can still pass in a par­tial­ly applied func­tion such as ((+) 42)

 

Final­ly, to help with debug­ging, I added two func­tions to print the cur­rent con­tent of the stack and heap:

image

 

Defining the Procedures

Next, let’s imple­ment the “pro­ce­dures” using the basic struc­ture Crista out­lined:

  1. read_file – take data on the stack (file path) and read the con­tent of the file onto the stack in its place;
  2. filter_chars – take data on the stack (con­tent of the file) and replace all non-alphanu­mer­ic char­ac­ters with space; put the new string back onto the stack;
  3. scan – take data on the stack and split the long string with space and put the indi­vid­ual words back onto the stack;
  4. remove_stop_words – process data on the stack (the indi­vid­ual words) and fil­ter out the stop words; the rest are put onto the heap
  5. fre­quen­cies – move the fil­tered words from heap back onto the stack and process them; keep­ing track of the fre­quen­cy for each word with a map in the heap
  6. sort – sort the word fre­quen­cies in heap

Since all these pro­ce­dures have to oper­ate against the stack, so they don’t need to take in any argu­ments nor do they need to return any­thing.

In addi­tion to the above, I also added a helper pro­ce­dure to split a string since it’s some­thing I need­ed on more than one occa­sion.

Of course, to stay in style we have to put the sep­a­ra­tor as well as the string we want to split on the stack and oper­ate on them via a bina­ry oper­a­tion that takes both as argu­ments.

image

Even then,the use of the String.Split method might still be con­sid­ered too high-lev­el for this style.

 

read_file

This pro­ce­dure is real­ly straight for­ward. It assumes the head of the stack is the file path.

image

After the pro­ce­dure call, the stack would con­tain the con­tent of the file:

image

 

filter_chars

I dis­liked the name of this pro­ce­dure as it was specif­i­cal­ly fil­ter­ing out non-alphanu­mer­ic char­ac­ters, which isn’t reflect­ed in its name. So I renamed the pro­ce­dure to be more spe­cif­ic, but the approach is the same:

image

This pro­ce­dure assumes the head of the stack con­tains the input that needs to be fil­tered. As Crista men­tioned in her imple­men­ta­tion, using a Regex is kin­da cheat­ing too as it’s too high lev­el for this style.

image

 

scan

Now that the fil­tered string is at the head of the stack, we can push the space char­ac­ter to the stack and then use the afore­men­tioned split pro­ce­dure to split the fil­tered string with space.

The result­ing array of strings (sit­ting at the head of the stack at this point) would then need to be pushed onto the stack as indi­vid­ual items.

image

image

 

remove_stop_words

image

First, we need to get the list of stop words from file. To do that, we push the file path to the stack; read it; then split the con­tent of the file (a com­ma-sep­a­rat­ed list of low­er-case strings).

Next, we need to move it to the heap for lat­er use.

At this point, our stack con­tains just the indi­vid­ual words from the scan pro­ce­dure.

image

For each of the words on the stack, we need to:

  • check if it’s list­ed as a stop word, and that it’s at least 2 char­ac­ters;
  • for words that pass our fil­ter, we’ll add it to the list of valid words in the heap;
  • once all the words have been processed, then we load the list of valid words from heap and add them back onto the stack.

Since we can’t use data on the heap direct­ly, so we had to push the rel­e­vant data onto the stack before run­ning oper­a­tions against it.

Also, notice in here, we didn’t con­cate­nate the list and store the new list in the heap in one go:

image

This is again, because cons ( :: ) is itself an oper­a­tion that should only be per­formed against the stack.

 

frequencies

At the end of the remove_stop_words the stack con­tains all the words that need to be count­ed.

image

For this task, we’ll store a dic­tio­nary for track­ing the count for each word in the heap. As we process each word from the stack, we’ll also need to move the dic­tio­nary onto the stack so we can oper­ate on it.

You may notice that I have cheat­ed here:

  • I per­formed dic­tio­nary look up and addi­tion in one oper­a­tion
  • using a dic­tio­nary is prob­a­bly too high-lev­el for this style too

A more in-style approach might look like the below:

image

 

At the end of this pro­ce­dure the stack will be emp­ty, and the heap would con­tain a “word_freqs” dic­tio­nary with the term fre­quen­cy for all the words.

 

sort

Here I’m mak­ing use of LINQ to sort the word fre­quen­cies, and each step along the way pops the head item off the stack and push­es the result in its place. Hence why Order, Take and Reverse are per­formed as sep­a­rate unary oper­a­tions.

image

Also, note that we need­ed to reverse the word­Fre­qs sequence to ensure that words with high­er fre­quen­cy are popped off the stack first.

 

Once we have run all the above pro­ce­dures, we will end up with a sequence of key val­ue pairs of word and asso­ci­at­ed count. All that’s left to do is to process them in turn and print them out:

image

and voila!

 

Conclusions

As with Good Old Times, cod­ing in this style makes things so much more com­plex com­pared to what one would do in idiomat­ic F#.

Whilst Good Old Times forces you to rely on doc­u­men­ta­tion to remem­ber what each cell in the pri­ma­ry mem­o­ry refers to, the pres­ence of a heap in Go Forth eas­es that bur­den.

How­ev­er, the pres­ence of the stack and the require­ment that all oper­a­tions much be per­formed against the stack intro­duces oth­er pain points. For instance:

  • even sim­ple oper­a­tions become con­vo­lut­ed;
  • hav­ing to per­form addi­tion­al work to move data to and from heap;
  • pro­ce­dures have to make assump­tions about the state of the stack;
  • expec­ta­tions of the pro­ce­dures can­not be expressed in their sig­na­ture since they have no dis­tin­guish­able sig­na­ture.

As func­tion­al pro­gram­mers, we like to (where pos­si­ble, and to the extend pos­si­ble with our lan­guage of choice) use the type sys­tem to con­struct a kin­da math­e­mat­i­cal proof that our appli­ca­tion will behave cor­rect­ly. This is what we refer to as Type Dri­ven Devel­op­ment, which of course is not pos­si­ble with this stack-based approach.

 

You can find all the source code for this exer­cise here.