Exercises in Programming Style–Spreadsheet

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Spread­sheet style today.

 

Style 26 – Spreadsheet

Constraints

  • The prob­lem is mod­eled like a spread­sheet, with columns of data and for­mu­las.
  • Some data depends on oth­er data accord­ing to for­mu­las. When data changes, the depen­dent data also changes auto­mat­i­cal­ly.

 

Since we’re mod­el­ling the prob­lem like a spread­sheet, let’s first define a few type alias to help us estab­lish our domain lan­guage.

  • a col­umn is ref­er­enced by a string — e.g. col­umn “A”, “B”, “C”, and so on
  • a col­umn can have explic­it val­ues, or a for­mu­la
  • a for­mu­la ref­er­ences one or more oth­er columns and uses their data to cal­cu­late the dis­play val­ue for the col­umn

Style26_01

We can mod­el a Col­umn with a union type, where a col­umn is:

a) giv­en explic­it val­ues; or

b) giv­en a for­mu­la, and its dis­play val­ue is cal­cu­lat­ed from that for­mu­la.

Style26_02

A Spread­sheet is a col­lec­tion of columns, whose val­ue can be retrieved or updat­ed using a Colum­n­Ref, e.g.

let spread­sheet = Spread­sheet()

spreadsheet.[“A”] <- Choice1Of2 [| “hel­lo”; “world” |]

let colum­nA = spreadsheet.[“A”] // [| “hel­lo”; “world” |]

Note that our cus­tom index­er accepts a Choice<DisplayValue, For­mu­la> in its set­ter instead of a Col­umn. I made this deci­sion because Column.Formula needs both the For­mu­la and its cal­cu­lat­ed Dis­play­Val­ue, and here it shouldn’t be the caller’s respon­si­bil­i­ty to exer­cise the for­mu­la and cal­cu­late its Dis­play­Val­ue.

Style26_03

After a column’s val­ue is updat­ed, we’ll recal­cu­late the Dis­play­Val­ue of all the columns in the Spread­sheet. This is a naive approach, but suf­fi­cient for the task at hand.

Dur­ing this step, we’ll need to recur­sive­ly eval­u­ate the Dis­play­Val­ue of the columns, some­times a col­umn might be eval­u­at­ed mul­ti­ple times if it’s ref­er­enced by sev­er­al for­mu­lae. Again, there’s room for opti­miza­tion here by caching pre­vi­ous­ly cal­cu­lat­ed val­ues.

Style26_04

 

Next, we’ll start inputting data into our Spread­sheet.

To make it eas­i­er to work with the Choice<DisplayValue, For­mu­la> type, let’s add two helper func­tions.

Style26_05

We’ll leave col­umn A and B blank for now.

Style26_06

Col­umn C will use the data from Col­umn A (all words) and Col­umn B (stop words) to cal­cu­late all the non-stop words.

Col­umn D will con­tain all the dis­tinct words from Col­umn C (non-stop words).

Style26_07

Col­umn E ref­er­ences both Col­umn C (non-stop words) and Col­umn D (dis­tinct non-stop words), and counts the fre­quen­cy for each of these words.

An impor­tant detail to note here is that, posi­tion­al­ly, each count in Col­umn E is aligned with the cor­re­spond­ing word in Col­umn D.Style26_08

Col­umn F is where we’ll store the out­put of the pro­gram.

Because Col­umn D (unique non-stop words) and Col­umn E (counts) are posi­tion­al­ly aligned, so we can use Array.zip to com­bine the two columns to cal­cu­late a sort­ed array of word fre­quen­cies.

Style26_09

Now that all the columns are set up, we can go back and input the text from Pride and Prej­u­dice and the list of stop words into Col­umn A and B respec­tive­ly.

Doing so will trig­ger the Dis­play­Val­ue of all the oth­er columns to be recal­cu­lat­ed.

Style26_10

Final­ly, we’ll take the top 25 rows from Col­umn F and print them to com­plete our pro­gram.

Style26_11

 

I hope you’ve enjoyed today’s style, apolo­gies for the lack of updates the last two weeks, I have been busy work­ing out some stuff on a per­son­al front. I’ll resume the two posts per week rou­tine, which means we should fin­ish this series in just over a month!

 

You can find the source code for this exer­cise here.