Exercises in Programming Style–Style 1

Yan Cui

I help clients go faster for less using serverless technologies.

NOTE : read the rest of the series, or check out the source code.

 

Prologue

I was at Joy of Coding earlier this year and one of the highlight for me was Crista Lopes’ keynote  Exercises in Programming Style.

Crista demonstrated how a simple problem of calculating term frequency can be written in a plethora of ways, including:

The point of these exercises is to allow you to see that there are many solutions to the same problem, and that each comes with a set of constraints that needs to be communicated.

image

I really enjoyed the talk and bought Crista’s book so I can go through all 33 approaches in my own time!

exercises-prog-styles-cover

It’s taken me a little while to find the time to do this, but I was finally able to make a start last weekend. Over the next couple of weeks I hope to share with you my ports of Crista’s Python solutions in F# and my takeaways.

If you like what you see then please buy the book and support Crista’s work!

 

Style 1 – Good Old Times

This style was commonplace in early 50s when hardware limitations had some hard constraints.

Constraints

  • Very small amount of primary memory, typically orders of magnitude smaller than the data that needs to be processed/generated.
  • No identifiers – i.e. no variable names or tagged memory addresses. All we have is memory that is addressable with numbers.

 

open System
open System.Collections.Generic
open System.IO
open System.Text
#time
let ``pride and prejudice`` = __SOURCE_DIRECTORY__ + "../pride-and-prejudice.txt"
let test = __SOURCE_DIRECTORY__ + "../test.txt"
(*
Possible Names
--------------
* Good old times
* Early 50s style
See original Python implementation:
https://github.com/crista/exercises-in-programming-style/blob/master/01-good-old-times/tf-01.py
CONSTRAINTS
-----------
* very small amount of primary memory, typically order of magnitude smaller than
the data that needs to be processed/generated
* no identifiers - i.e. no variable names or tagged memory addresses.
All we have is memory that is addressable with numbers.
STRATEGY
--------
(Part 1)
1. read the input file one line at a time
2. filter the characters, normalize to lower case
3. identify words, increment corresponding counts in secondary memory (a file)
(Part 2)
4. find the 25 most frequent words in secondary memory (file)
*)
// simulate the absense of identifiers, so everything goes into this array which
// acts as the 'main memory'
// 0 = stop words
// 1 = input buffer (1024 bytes)
// 2 = start index (used in findWords)
// 3 = length of word (used in findWords)
// 4 = the word
// 5 = write buffer (word + count)
// 6 = current position in word_freqs
// 7 = current word_freqs line
// 8 = current min word freq of the top 25
let data = Array.zeroCreate<obj> 9
// PART 1
data.[0] <- (File.ReadAllText(__SOURCE_DIRECTORY__ + "../stop_words.txt")
.Split(',')
|> Set.ofArray) :> obj
let wordFreqs = File.Open(__SOURCE_DIRECTORY__ + "../word_freqs.txt",
FileMode.Create,
FileAccess.ReadWrite)
let input = new StreamReader(``pride and prejudice``)
//let input = new StreamReader(test)
let (|IsAlnum|_|) c = if System.Char.IsLetterOrDigit c then Some () else None
// note: each line in word_freqs is "word,count", so these two funcs just make
// the code easier to read
let getWord (x : obj) = (x :?> string).Split(',').[0]
let getFreq (x : obj) = (x :?> string).Split(',').[1] |> Int32.Parse
let findWords (line : string) =
seq {
data.[2] <- -1 :> obj
data.[3] <- 0 :> obj
for i = 0 to line.Length-1 do
match line.[i], data.[2] :?> int with
| IsAlnum, -1 -> // start of word
data.[2] <- i :> obj
data.[3] <- 1 :> obj
| IsAlnum, _ -> // another char for word
data.[3] <- (data.[3] :?> int + 1) :> obj
| _, -1 -> () // no word, not started, ignore
| _ -> // end of word
yield line.Substring(data.[2] :?> int, data.[3] :?> int)
data.[2] <- -1 :> obj
data.[3] <- 0 :> obj
if (data.[3] :?> int) > 0 then
yield line.Substring(data.[2] :?> int, data.[3] :?> int)
}
|> Seq.map (fun x -> x.ToLower())
|> Seq.filter (fun x ->
x.Length >= 2 &&
(data.[0] :?> Set<string>).Contains x |> not)
let write (stream : Stream) word count =
data.[5] <- sprintf "%s,%6d\n" word count :> obj
stream.Write(Encoding.ASCII.GetBytes(data.[5] :?> string), 0, (data.[5] :?> string).Length)
stream.Flush()
let incrFreq (word : string) =
// since the reader reads 128 chars at a time, so use this to track where we are
data.[6] <- 0L :> obj
wordFreqs.Seek(0L, SeekOrigin.Begin) |> ignore
use reader = new StreamReader(wordFreqs, Encoding.ASCII, false, 128, true)
data.[7] <- reader.ReadLine() :> obj
while data.[7] <> null && data.[6] <> null do
if word = getWord data.[7] then
wordFreqs.Seek(data.[6] :?> int64, SeekOrigin.Begin) |> ignore
write wordFreqs word (getFreq data.[7] + 1)
// set read pos to null to indicate we've found and incremented count
data.[6] <- null
else
// note: +1 here to account for the newline character
data.[6] <- (data.[6] :?> int64 + int64 (data.[7] :?> string).Length + 1L) :> obj
data.[7] <- reader.ReadLine() :> obj
if data.[6] <> null then
write wordFreqs word 1
data.[1] <- input.ReadLine() :> obj
while (data.[1] <> null) do
data.[1] :?> string |> findWords |> Seq.iter incrFreq
data.[1] <- input.ReadLine() :> obj
// PART 2
wordFreqs.Seek(0L, SeekOrigin.Begin)
let reader = new StreamReader(wordFreqs)
// make the sorted list 1 bigger to prevent resizing
let list = new System.Collections.Generic.SortedList<string, int>(25 + 1)
data.[7] <- reader.ReadLine() :> obj
while data.[7] <> null do
if list.Count < 25 then
list.Add(getWord data.[7], getFreq data.[7])
else
data.[8] <- (list |> Seq.minBy (fun kvp -> kvp.Value)) :> obj
if (data.[8] :?> KeyValuePair<string, int>).Value < getFreq data.[7] then
(data.[8] :?> KeyValuePair<string, int>).Key
|> list.Remove
|> ignore
list.Add(getWord data.[7], getFreq data.[7])
data.[7] <- reader.ReadLine() :> obj
list
|> Seq.sortByDescending (fun kvp -> kvp.Value)
|> Seq.iter (fun (KeyValue(word, count)) -> printfn "%s - %d" word count)
Get the source code here.

As you can expect, not being able to use identifiers really stops you in your tracks and made an otherwise simple task so much more complex (also makes you appreciate the work of language and hardware designers that came before us).

The implementation feels deeply unnatural due to the constraints we have imposed on ourselves, and that’s kinda the point of these exercises.

The absence of identifiers for instance, forces us to lump a bunch of global variables into an array that acts as our ‘primary memory’ for the program and forces you to remember which index corresponds to what.

This style is really at odds with how one would code in F# (probably even more so than Python!), as by design F# nudges you towards code in a functional style – using immutable values, recursion, using types to drive your program, etc.

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
  2. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
  3. Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.

Leave a Comment

Your email address will not be published. Required fields are marked *