If you enjoy reading these exercises then please buy Crista’s book to support her work.
Style 2 – Go Forth
This style is named after the Forth programming language, which is a stack-oriented. Another popular stack-oriented language is Factor, which Bruce Tate covered in Seven More Languages in Seven Weeks.
- Existence of a data stack. All operations are done over data on this stack.
- Existence of a heap for storing data that’s needed for later operations. The heap data can be associated with names (i.e. variables). Since all operations are done over data on the stack, heap data needs to be moved first to the stack and eventually back to the heap.
- Abstraction in the form of user-defined “procedures” (i.e. names bound to a set of instructions), which may be called something else entirely.
Admittedly this is another difficult style to program in, where even simple things such as adding two numbers together requires:
- pushing the two numbers onto the stack
- perform addition against the two items
- pushing the result onto the stack
You might have also known this kind of arithmetic operations as Reverse Polish Notation.
You may also be interested to know that the CIL (you know, the intermediate language your C#/F# code gets compiled into) is also stack-based and works in this way.
I’m not satisfied with Crista’s solution in her book, where she went out of style many times (you can see by the number of “out of style, left for exercise” comments in the code). In a way, it’s perhaps a reflection of the difficulty in coding in this style, especially when that works against the idioms of the language you’re programming in.
Now, it’s easy to criticise these violations, but I get it, I wanted to cheat so many times throughout – just get something from the heap, apply function over it, and make life easier for myself.
To help fight the temptations to break style, I made some modifications to Crista’s implementation to better enforce the constraints we have placed upon ourselves.
Before we start solving the term frequencies problem, let’s first setup the environment we’re gonna be working in which enforces the aforementioned constraints.
We know we need a stack and a heap, as well as the ability to get data to and from both.
Couple of things to note from the above:
- both stack and heap are private;
- pop is also private;
- load doesn’t return the associated value, but pushes it on the stack straight away.
These design decisions stem from my attempt to make it harder for me to break style later on.
- stack and heap are private so I can’t cheat by manipulating them directly;
- pop is private so that I can’t just pop a value off the stack and then perform an operation with another operant that wasn’t on the stack, e.g.
let x = pop<int>()
let y = add x 42 // 42 is not on the stack!
// and don’t forget to push the result back onto the stack!
- load doesn’t return the value directly to enforce the constraint that data on the heap has to be copied to the stack before you can perform any operation on them.
Since we can’t pop items off the stack outside of this module, we’ll have to give consumers of this module some other way to perform operations against data on the stack.
Here we have a couple of functions that takes in an operation (i.e. a function) and performs it against the stack.
Notice that there are variants for unary and binary operations, this is to cater for cases such as “pop item off the stack and move it back to the heap”.
Whilst this is not a cheat-proof* way to force you to perform all operations against the stack, it proved a sufficient deterrent for me during this exercise.
* for instance, you can still pass in a partially applied function such as ((+) 42)
Finally, to help with debugging, I added two functions to print the current content of the stack and heap:
Defining the Procedures
Next, let’s implement the “procedures” using the basic structure Crista outlined:
- read_file – take data on the stack (file path) and read the content of the file onto the stack in its place;
- filter_chars – take data on the stack (content of the file) and replace all non-alphanumeric characters with space; put the new string back onto the stack;
- scan – take data on the stack and split the long string with space and put the individual words back onto the stack;
- remove_stop_words – process data on the stack (the individual words) and filter out the stop words; the rest are put onto the heap
- frequencies – move the filtered words from heap back onto the stack and process them; keeping track of the frequency for each word with a map in the heap
- sort – sort the word frequencies in heap
Since all these procedures have to operate against the stack, so they don’t need to take in any arguments nor do they need to return anything.
In addition to the above, I also added a helper procedure to split a string since it’s something I needed on more than one occasion.
Of course, to stay in style we have to put the separator as well as the string we want to split on the stack and operate on them via a binary operation that takes both as arguments.
Even then,the use of the String.Split method might still be considered too high-level for this style.
This procedure is really straight forward. It assumes the head of the stack is the file path.
After the procedure call, the stack would contain the content of the file:
I disliked the name of this procedure as it was specifically filtering out non-alphanumeric characters, which isn’t reflected in its name. So I renamed the procedure to be more specific, but the approach is the same:
This procedure assumes the head of the stack contains the input that needs to be filtered. As Crista mentioned in her implementation, using a Regex is kinda cheating too as it’s too high level for this style.
Now that the filtered string is at the head of the stack, we can push the space character to the stack and then use the aforementioned split procedure to split the filtered string with space.
The resulting array of strings (sitting at the head of the stack at this point) would then need to be pushed onto the stack as individual items.
First, we need to get the list of stop words from file. To do that, we push the file path to the stack; read it; then split the content of the file (a comma-separated list of lower-case strings).
Next, we need to move it to the heap for later use.
At this point, our stack contains just the individual words from the scan procedure.
For each of the words on the stack, we need to:
- check if it’s listed as a stop word, and that it’s at least 2 characters;
- for words that pass our filter, we’ll add it to the list of valid words in the heap;
- once all the words have been processed, then we load the list of valid words from heap and add them back onto the stack.
Since we can’t use data on the heap directly, so we had to push the relevant data onto the stack before running operations against it.
Also, notice in here, we didn’t concatenate the list and store the new list in the heap in one go:
This is again, because cons ( :: ) is itself an operation that should only be performed against the stack.
At the end of the remove_stop_words the stack contains all the words that need to be counted.
For this task, we’ll store a dictionary for tracking the count for each word in the heap. As we process each word from the stack, we’ll also need to move the dictionary onto the stack so we can operate on it.
You may notice that I have cheated here:
- I performed dictionary look up and addition in one operation
- using a dictionary is probably too high-level for this style too
A more in-style approach might look like the below:
At the end of this procedure the stack will be empty, and the heap would contain a “word_freqs” dictionary with the term frequency for all the words.
Here I’m making use of LINQ to sort the word frequencies, and each step along the way pops the head item off the stack and pushes the result in its place. Hence why Order, Take and Reverse are performed as separate unary operations.
Also, note that we needed to reverse the wordFreqs sequence to ensure that words with higher frequency are popped off the stack first.
Once we have run all the above procedures, we will end up with a sequence of key value pairs of word and associated count. All that’s left to do is to process them in turn and print them out:
As with Good Old Times, coding in this style makes things so much more complex compared to what one would do in idiomatic F#.
Whilst Good Old Times forces you to rely on documentation to remember what each cell in the primary memory refers to, the presence of a heap in Go Forth eases that burden.
However, the presence of the stack and the requirement that all operations much be performed against the stack introduces other pain points. For instance:
- even simple operations become convoluted;
- having to perform additional work to move data to and from heap;
- procedures have to make assumptions about the state of the stack;
- expectations of the procedures cannot be expressed in their signature since they have no distinguishable signature.
As functional programmers, we like to (where possible, and to the extend possible with our language of choice) use the type system to construct a kinda mathematical proof that our application will behave correctly. This is what we refer to as Type Driven Development, which of course is not possible with this stack-based approach.
You can find all the source code for this exercise here.