Exercises in Programming Style–Cookbook

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Cook­book style today.

 

Style 4 – Cookbook

Although Crista has called this the Cook­book style, you’d prob­a­bly know it as Pro­ce­dur­al Pro­gram­ming.

Constraints

  • No long jumps
  • Com­plex­i­ty of con­trol flow tamed by divid­ing the large prob­lem into small­er units using pro­ce­dur­al abstrac­tion
  • Pro­ce­dures may share state in the form of glob­al vari­ables
  • The large prob­lem is solved by apply­ing the pro­ce­dures, one after the oth­er, that change, or add to, the shared state

 

As stat­ed in the con­straints, we need to solve the term fre­quen­cies prob­lem by build­ing up a sequence of steps (i.e . pro­ce­dures) that mod­i­fies some shared states along the way.

So first, we’ll define the shared states we’ll use to:

  • hold the raw data that are read from the input file
  • hold the words that will be con­sid­ered for term fre­quen­cies
  • the asso­ci­at­ed fre­quen­cy for each word

image

 

Procedures

I fol­lowed the basic struc­ture that Crista laid out in her solu­tion:

  1. read_file : reads entire con­tent of the file into the glob­al vari­able data;
  2. filter_chars_and_normalize : replaces all non-alphanu­mer­ic char­ac­ters in data with white space’’;
  3. scan : scans the data for words, and adds them to the glob­al vari­able words;
  4. remove_stop_words : load the list of stop words; appends the list with sin­gle-let­ter words; removes all stop words from the glob­al vari­able words;
  5. fre­quen­cies : cre­ates a list of pairs asso­ci­at­ing words with fre­quen­cies;
  6. sort : sorts the con­tents of the glob­al vari­able word­Fre­qs by decreas­ing order of fre­quen­cy

 

image

As you can see, read­File is real­ly straight for­ward. I cho­sen to store the con­tent of the file as a char[] rather than a string because it sim­pli­fies fil­ter­CharsAnd­Nor­mal­ize:

  • it allows me to use Char.IsLetterOrDigit
  • it gives me muta­bil­i­ty (I can replace the indi­vid­ual char­ac­ters in place)

image

The down­side of stor­ing data as a char[] is that I will then need to con­struct a string from it when it comes to split­ting it up by white space.

image

To remove the stop words, I loaded the stop words into a Set because it pro­vides a more effi­cient lookup.

image

For the fre­quen­cies pro­ce­dure,  I would have liked to sim­ply return the term fre­quen­cies as out­put, but as the con­straint says that

The large prob­lem is solved by apply­ing the pro­ce­dures, one after the oth­er, that change, or add to, the shared state

so unfor­tu­nate­ly we’ll have to update the word­Fre­qs glob­al vari­able instead…

image

And final­ly, sort is straight for­ward thanks to Array.sortInPlaceBy:

image

 

To tie every­thing togeth­er and get the out­put, we sim­ply exe­cute the pro­ce­dures one after anoth­er and then print out the first 25 ele­ments of word­Fre­qs at the end.

image

 

Conclusions

Since each pro­ce­dure is mod­i­fy­ing shared states, this intro­duces tem­po­ral depen­den­cy between the pro­ce­dures. As a result:

  • they’re not idem­po­tent – run­ning a pro­ce­dure twice will cause different/invalid results
  • they’re not iso­lat­ed in mind­scape – you can’t think about one pro­ce­dure with­out also think­ing about what the pre­vi­ous pro­ce­dure has done to shared state and what the next pro­ce­dure expects from the shared state

Also, since the expec­ta­tion and con­straints of the pro­ce­dures are only cap­tured implic­it­ly in its exe­cu­tion log­ic, you can­not lever­age the type sys­tem to help com­mu­ni­cate and enforce them.

That said, many sys­tems we build nowa­days can be described as pro­ce­dur­al in essence – most web ser­vices for instance – and rely on shar­ing and chang­ing glob­al states that are prox­ied through data­bas­es or cache.

 

You can find all the source code for this exer­cise here.