Check out my new course Learn you some Lambda best practice for great good! and learn the best practices for performance, cost, security, resilience, observability and scalability.
If you enjoy reading these exercises then please buy Crista’s book to support her work.
Style 27 – Lazy Rivers
- Data comes to functions in streams, rather than as a complete whole all at at once.
- Functions are filters / transformers from one kind of data stream to another.
Given the constraint that data is to come in as streams, the easiest way to model that in F# is using sequences.
First, let’s add a function to read the text from an input file as a seq<char>.
Then, we’ll add another function to transform this sequence of characters into a sequence of words.
Next, we’ll filter this sequence to remove all the stop words and return the non-stop words as another seq<string>.
So far everything’s pretty straightforward, but things get a bit tricky from here on.
To count and sort the non-stop words, we can return the running count as a sequence after each word, but that’s terribly inefficient.
Instead, we can batch the input stream into groups of 5000 words. We’ll update the word frequencies with each batch and produce a new sorted array of word counts for the output sequence.
In the snippet above, focus on the Seq.scan section (ps. Seq.scan is similar to Seq.fold, except it returns all the intermediate values for the accumulator, not just the final accumulated value).
Here, given the current word frequencies map and a batch of words, we’ll return a new map (remember, a F# Map is immutable) whose counts have been updated by the latest batch.
Because Seq.scan returns results for all intermediate steps including the initial value (the empty map in this case), we have to follow up with Seq.skip 1 to exclude the empty map from our output.
Finally, to string everything together, we’ll print the top 25 words for each of the outputs from countAndSort. It takes quite a few iterations, but you’ll see the result slowly emerging in the process.
Unlike the other styles, we’ll get a few sets of outputs – one for each batch of words processed.
You can find the source code for this exercise here.
I specialise in rapidly transitioning teams to serverless and building production-ready services on AWS.
Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.
Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. Enrol now and enjoy a special preorder price of £9.99 (~$13).
Are you working with Serverless and looking for expert training to level-up your skills? Or are you looking for a solid foundation to start from? Look no further, register for my Production-Ready Serverless workshop to learn how to build production-grade Serverless applications!
Here is a complete list of all my posts on serverless and AWS Lambda. In the meantime, here are a few of my most popular blog posts.
- Lambda optimization tip – enable HTTP keep-alive
- You are thinking about serverless costs all wrong
- Many faced threats to Serverless security
- We can do better than percentile latencies
- I’m afraid you’re thinking about AWS Lambda cold starts all wrong
- Yubl’s road to Serverless
- AWS Lambda – should you have few monolithic functions or many single-purposed functions?
- AWS Lambda – compare coldstart time with different languages, memory and code sizes
- Guys, we’re doing pagination wrong