Building a random arts bot in F#

Since joining JUST EAT I have been much more active in attending meetups because our office is walking distance to the Code Node. In particular, I have been a regular at Phil Trelford’s F#unctional Londoners group which meets twice a month.

In the last month alone, we’ve been spoiled with Phil’s Random Arts hands-on session and Mathias’s experience report on building the World Bank Facts twitter bot. You see where this is going…

So, combining the inspiration (and code ) from both, I decided to create a twitter bot for generating random arts.

Design Goals

I had a few goals in mind when I sat down to write the bot:

  1. periodically publish randomly generated images (like what the LSystem Bot does for L-Systems)
  2. support a simple DSL for expressing the maths equations used to generate the images
  3. allow others to tweet to it in the above DSL and reply with the generated image

DSL

Let’s start with the AST, the syntax of the DSL is just how we get to the AST. This is an extended set of the expressions that Phil shared with us in his session:

rand_art_01

One approach would be to use a natural maths syntax, e.g.

  • x + y
  • tan y
  • cos x * const
  • y + sqr x * sin y

The challenge with this approach is to clearly define and correctly support the precedence rules.

Another approach I considered is to use a LISP syntax, e.g.

  • (+ x y)
  • (tan y)
  • (* (cos x) const)
  • (+ y (* (sqr x) (sin y)))

This approach is slightly more verbose, but has the benefit of simplicity and precedence is very obvious. It’s also very easy to write a S-Expression parser, which also factored into my decision to go with this approach.

To go from the AST to the S-Expression is super easy, just override the ToString() method like this:

rand_art_02_v2

Now that the easy parts are done, let’s write a S-Expression parser for our DSL.

Broadly speaking, the two common approaches for writing parsers in F# is to use FParsec or vanilla active patterns. Having imagined how I’d write the parser with both approaches I decided the FParsec route is simpler.

The following 25 lines of code would build the foundation for the whole DSL, spend a moment or two to see if you can work out what it’s doing.

rand_art_03

OK, going back to the AST, we can break things down into 4 categories:

  1. primitives : x, y, and cost
  2. unary functions : sin, cos, tan, well, tent,  sqr and sqrt
  3. binary functions : add, subtract, product, divide, max, min, average and mod
  4. ternary functions : level and mix

In the code snippet above, we first created parsers for ( and ), then parsers for x, y and const. But this line is interesting…

rand_art_04

This allows for a recursive parser for our recursive AST (expression is defined in terms of functions, which are defined in terms of expressions, and so on).

For all the unary functions we have right now and in the future, their structure is the same:

( func_name some_expr )

now see how we parse them:

rand_art_05

up until the |>> operator we have captured the Expr object to invoke op with, whatever op is.

rand_art_06

Interesting, so op is any function that takes Expr as argument. That’s convenient, because every clause of a union type is a function in its own right. For example, in our AST, Sin is a function that takes an Expr and returns an Expr..

rand_art_07

Aha, so we can stay DRY and use the unaryOp, binaryOp and ternaryOp functions above to create corresponding parser for the unary, binary and ternary functions in our AST 

rand_art_08

To wrap things up on the DSL, we have to define the implementation for our Expr parser:

rand_art_09

The use of the attempt function here is significant here, otherwise, each failed attempt to parse an expression would have consumed the leading parenthesis ‘(‘ and cause all subsequent parsers to also fail.

What else?

Since I have “borrowed” heavily from both Phil and Mathias’s work, much of the building blocks I needed to write the bot was already there. But I also encountered a few new challenges along the way though.

Random is not threadsafe

When you use an instance of System.Random from multiple threads you can mess up its internal state, it’ll then always return 0 which is hardly random at all…

Since I decided to go async from the start I was able to reproduce this problem regularly. The solution was to simply instantiate a new instance of Random before I need to generate a new formula or draw another image.

Formulae are usually too long to fit inside a tweet

The randomly generated formulae are almost always too long and couldn’t fit into a tweet. To increase the chance of generating formulae that are tweet-sized, I limited the max depth of the expression generation logic to 6. For example:

Now, roughly around half the generated formulae can fit into the 140 characters limit.

Many boring images

Another thing that quickly became obvious was that, many of the generated formulae would yield a similar and kinda boring images (mostly a black screen). So, to improve the general quality of the bot’s output, I added another step in the pipeline to inspect the generated bitmap.

For now, this step would filter out images that are mostly a black screen.

Loopy conversation with another bot

It’s easy to get into a loop between two bots, and so far I have some rudimentary measures in place to stop the conversation. However it is just as likely to prematurely stop the conversation with an innocent user who has made an error in his/her tweet.

There’s still work to be done to find the right balance here, if you’ve got some ideas/suggestions, I’d love to hear about them in the comments below.

Auto-follow

If someone tweets at the bot then they’re probably showing an interest in it. Let’s auto-follow them, and maybe they’ll follow us back 

Supporting direct messages

Unsurprisingly, after a few people became aware of the bot’s existence I found someone sent direct messages to the bot to see if they can get an image back via DM as well as tweets. Sadly it’s not yet supported, but it’s a trivial effort to add so expect to see this in the near future.

 

So that’s it folks, a quick update on what I did last weekend, if you’d like to give the bot a try then feel free to send tweets to @RandomArtsBot. You can find the documentation for the DSL syntax here, or just tweet “help” at the bot.

There are also a number of other similar bots that you might like to play around with, for instance, @fsibot, @worldbankfacts, @SpirographBot and @LSystemBot.

Exercises in Programming Style–Bulletin Board

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Bulletin Board style today.

 

Style 15 – Bulletin Board

Constraints

  • Larger problem is decomposed into entities using some form of abstraction
  • The entities are never called on directly for actions
  • Existence of an infrastructure for publishing and subscribing to events, aka the bulletin board
  • Entities post events subscriptions to the bulletin board, and publish events to the bulletin board. The bulletin board infrastructure does all the event management and distribution

 

If you’ve done any WPF work in the past (or in the present) then you might recognize this style as the Weak Event pattern.

Whilst similar in design to the Hollywood style, this style further decouples the entities with an EventManager that sits between them all.

So let’s start by defining what a handler and event manager looks like:

Style15_01

Unfortunately we have to forfeit some type safety here as we unify different events under the same umbrella. Here’s the implementation for the IEventManager interface:

Style15_02

The 3 entities – DataStorage, StopWordsFilter and WordFrequencyCounter – are similar to those in the Hollywood style, except they now subscribe and publish events through the aforementioned IEventManager abstraction.

Oh, and we’ll stick with the RunArgs type we created last time:

Style15_03

Style15_04

Style15_05

Finally, to put everything together:

Style15_06

 

You can find the source code for this exer­cise here.

Exercises in Programming Style–Hollywood

NOTE : read the rest of the series, or check out the source code.

If you enjoy read­ing these exer­cises then please buy Crista’s book to sup­port her work.

exercises-prog-styles-cover

Fol­low­ing on from the last post, we will look at the Hollywood style today.

 

Style 14 – Hollywood

Constraints

  • Larger problem is decomposed into entities using some form of abstraction
  • The entities are never called on directly for actions
  • The entities provide interfaces for other entities to be able to register callbacks
  • At certain points of the computation, the entities call on the other entities that have registered for callbacks

 

I’m not overly fond of Crista’s Python interpretation of this style, so I decided to do something slightly different using F#’s built-in support for Observables. You could also use Rx via FSharp.Control.Reactive, though it seemed to be overkill for this particular problem.

Sticking with the same entities Crista defined in her example:

  • DataStorage
  • StopWordsFilter
  • WordFrequencyCounter

in our F# version, each will subscribe to an upstream IObservable, does whatever it needs to do and provide the output also as an IObservable for downstream entities.

Style14_designv2

And upstream of all the entities is an attempt to run a term frequency analysis against a data file and a corresponding stop words file:

Style14_01

Our version of DataStorage would depend on an IObservable<RunArgs>. It’ll in turn expose an IObservable<RunArgs * string[]> as member so downstream entities can subscribe to and be notified when DataStorage is able to load the words from the specified data file.

Style14_02

Next, we’ll implement a StopWordsFilter type that will:

  1. subscribe to an IObservable<RunArgs * string[]>;
  2. on a new value, load the stop words and use them to filter the words from the data file;
  3. make the filtered words available to downstream entities via an IObservable<string[]>

Style14_03

Finally we have the WordFrequencyCounter, which takes an IObservable<string[]> and print the top 25 most frequent words:

Style14_04

To string everything together, we’ll create an instance of IObservable<RunArgs> via an F# Event (Event.Publish gives us an instance of IEvent<‘T> which inherits from IObservable<T>).

This IOservable will act as the upstream to DataStorage and to kick things off we just have to trigger a new event with an instance of RunArgs:

Style14_05

 

You can find the source code for this exer­cise here.

Project Euler – Problem 102 Solution

The problem description is here, and click here to see all my other Euler solutions in F#.

 

After reading the question, a quick search on how to test if a point is in a triangle turned up this useful SO answer. Translating the algorithm to F# is pretty trivial:

Prob102_01

I saved the triangle.txt file alongside the script file, so to make it easier to load and parse. The content of the file looks like this:

-340,495,-153,-910,835,-947

-175,41,-421,-714,574,-645

-547,712,-352,579,951,-786

419,-864,-83,650,-399,171

so for each line, we have the 3 (x,y) coordinates separated by comma.

F# 4 also added a chunkBySize combinator to the Array module, which comes in handy here:

Prob102_02

This solution runs for 20ms on my machine.

 

The source code for this solution is here.

Project Euler – Problem 75 Solution

The problem description is here, and click here to see all my other Euler solutions in F#.

 

I based my solution on Euclid’s formula for generating Pythagorean triples.

Prob75_01_wiki

And given that max L is 1,500,000, the maximum value for m we need to consider is \sqrt{\frac{L}{2}} . Because L = a + b + c and a^2 + b^2 = c^2 , we can deduce that c < \frac{L}{2} ; and since c = m^2 + n^2 we also have m < \sqrt{c} and therefore m < \sqrt{\frac{L}{2}} .

Prob75_01

The above makes use of a recursive function to calculate the GCD (based on Euclidean Algorithm):

Prob75_04

For efficiency, we’ll create a cache to store the number of ways L can be used to create integer sided right-angle triangle. As we iterate through the m and n pairs we generated above, we’ll take advantage of the fact that if a^2 + b^2 = c^2 then ka^2 + kb^2 = kc^2 must also be true and increment multiples of L by one.

Prob75_02

Finally, to work out the answer:

Prob75_03

This solution runs for about 350ms on my machine.

 

The source code for this solution is here.