Understanding homoiconicity through Clojure macros

clojure_homoiconicity

Hav­ing been pri­mar­ily involved with .Net lan­guages in my career so far, homoiconic­ity was a new idea to me when I first encoun­tered it in Clo­jure (and also later in Elixir).

If you look it up on wikipedia, you’ll find the usual wordy def­i­n­i­tion that vaguely makes sense.

…homoiconic­ity is a prop­erty of some pro­gram­ming lan­guages in which the pro­gram struc­ture is sim­i­lar to its syn­tax, and there­fore the program’s inter­nal rep­re­sen­ta­tion can be inferred by read­ing the text’s layout…”

so, in short: code is data, data is code.

 

quote & eval

Take a sim­ple example:

image

this line of code per­forms a tem­po­rary bind­ing (binds x to the value 1) and then incre­ments x to give the return value of 2.

So it’s code that can be exe­cuted and yields some data.

But equally, it can also be thought of as a list with three elements:

  • a sym­bol named let;
  • a vec­tor with two ele­ments – a sym­bol named x, and an integer;
  • a list with two ele­ments – a sym­bol named inc, and a sym­bol named x.

quote

You can use the quote func­tion to take some Clo­jure code and instead of eval­u­at­ing it, return it as data.

image

side­bar: any F# devel­oper read­ing this will notice the sim­i­lar­ity to code quo­ta­tions in F#, although the rep­re­sen­ta­tion you get back is not nearly as easy to manip­u­late nor is there a built-in way to eval­u­ate it. That said, you do have some options, including:

 

eval

On the flip side, you have the eval func­tion. It takes data and exe­cutes it as code.

image

After you have cap­tured some exe­cutable code as data, you can also manip­u­late it before exe­cut­ing the trans­formed code. This is where macros come in.

 

Macros

clojure.test for instance, is a unit test frame­work writ­ten with macros. You can do a sim­ple asser­tion using the ‘is’ macro:

image

and con­trast this with the error mes­sage we get from, say, NUnit.

image

Isn’t it great that the fail­ing expres­sions are printed out so you can straight away see what was wrong? It’s much more infor­ma­tive than the generic mes­sage we get from NUnit, which forces us to dig around and fig­ure out which line of the test failed.

 

Build­ing an assert-equals macro

As a process of dis­cov­ery, let’s see how this can be done via macros.

 

Ver­sion 1

To start off, we will define the sim­plest macro that might work:

image

oops, so that last case didn’t work.

That’s because the actual and expected val­ues passed into the macro are code, not the inte­ger value 2.

image

 

Ver­sion 2

So what if we just throw an eval in there?

image

that works, right? right?

Well, not quite.

Instead of manip­u­lat­ing the data rep­re­sent­ing our code, we have eval­u­ated them at com­pile time (macros runs at com­pile time).

You can ver­ify this by using macroex­pand:

image

so you can see that our macro has trans­formed the input code into the boolean value true and returned it as code.

 

Ver­sion 3

What we ought to do is return the code we want to exe­cute as data, which we know how to do already – using the quote func­tion. In the returned code, we also need to error when the asser­tion fails.

So start­ing with the code we want to exe­cute given that:

image

well, we’d want to:

  • com­pare the eval­u­ated val­ues of actual and expected and throw an Asser­tion­Error if they are not equal
  • dis­play the actual expres­sion (inc 1)  and expected expres­sion (+ 0 1) in the error message
  • dis­play the eval­u­ated value for actual — 2

so some­thing along the lines of the fol­low­ing, perhaps?

image

Now that we know our endgame we can work back­wards to define our macro:

image

See the resem­blance? The impor­tant thing to note here is that we have quoted the whole let block (via the ‘ short­hand). But in order to ref­er­ence the actual and expected expres­sions and return them as they are, i.e. (inc 1), (+ 0 1) , we had to selec­tively unquote cer­tain things using the ~ operator.

You can expand the macro and see that it’s seman­ti­cally iden­ti­cal to the code that we wanted to output:

image

Before we move on, you might be won­der­ing about some of the quote-unquote, unquote-quote actions going on here, so let’s spend a few moments to dwell into them.

Out­putting the actual expres­sion to be evaluated

Remem­ber, the actual and expected argu­ments in our def­macro block are the quoted ver­sions of (inc 1) and (+ 0 1).

We want to eval­u­ate actual only once for effi­ciency, and in case it causes side effects. Which is why we need to eval­u­ate it and bind the result to a symbol.

In order to gen­er­ate the out­put code (let [actual-value (inc 1)] …) which will eval­u­ate (inc 1) at run­time, we need to ref­er­ence the actual expres­sion in its quoted form, hence ~actual.

Note the dif­fer­ence in the expanded code if we don’t unquote actual.

image

with­out the ~, the gen­er­ated code would look for a local vari­able called actual which will fail because it doesn’t exist.

Out­putting the actual-value symbol

In order to out­put the actual-value sym­bol in the let bind­ing we had to write ~’actual-value, that is, (unquote (quote actual-value)).

image

I know, right!? Took me a while to get my head around it too.

Q. Can we not just write ‘(let [actual-value ~actual] …) ?

A. No, because it’ll trans­late to (let [user/actual-value (inc 1)]…) which is not a valid let binding.

Q. Ok, how about ~actual-value?

A. No, because the macro won’t com­pile as we’ll be look­ing for a non-existent local vari­able actual-value inside the scope of def­macro.

Q. Ok.. or ‘actual-value?

A. No, because it’ll trans­late to (let [(quote actual-value)  (inc 1)]…) which fails at run­time because that’s not a valid syn­tax for binding.

Q. So how does ~’actual-value  work exactly?

A.  The following:

  1. (quote actual-value) to cap­ture the sym­bol actual-value
  2. unquote the sym­bol so that it appears as it is in the out­put code

Out­putting the actual and expected expres­sions

Finally, when for­mu­lat­ing the error mes­sage, we also saw ‘~actual and ‘~expected.

Here are the expanded code with and with­out the quote.

image

See the difference?

With­out the quote, the gen­er­ated code will have eval­u­ated (inc 1) and printed FAIL in 2.

With the quote, it’d have printed FAIL in (inc 1) instead, which is what we want.

Rule of thumb

  • to cap­ture a sym­bol, use ~’symbol-name
  • to ref­er­ence an argu­ment to the macro and gen­er­ate code that will be eval­u­ated at run­time, use ~arg-name
  • to ref­er­ence an argu­ment to the macro and gen­er­ate code that quotes it at run­time, use ‘~arg-name

 

Finally, let’s test out our new macro.

image

Sweet! So that’s it?

Almost.

There’s a minor prob­lem with our macro here – it’s not safe from name col­li­sions on actual-value.

image

 

Ver­sion 4

If you see # at the end of a sym­bol then this is used to auto­mat­i­cally gen­er­ate a new sym­bol with a ran­dom name. This is use­ful in macros as it keeps the sym­bols declared in macros from leak­ing out.

So instead of using ~’actual-value in the let bind­ing we might do the fol­low­ing instead:

image

When expanded, you can see the let bind­ing is using a ran­domly gen­er­ated sym­bol actual-value__16087__auto__:

image

Not only is this ver­sion safer, it’s also more read­able with­out the mind-bending (unquote (quote actual-value)) business!

 

So there, a quick(-ish) intro­duc­tion to homoiconic­ity and Clo­jure macros. Macros are a pow­er­ful tool to have in one’s tool­box, and allows you to extend the lan­guage in a very nat­ural way as clojure.test does. I hope you find the idea inter­est­ing and I have done the topic jus­tice and explained it clearly enough.

Feel free to let me know in the com­ments if anything’s not clear.

 

Links

Rust – memory safety without garbage collector

rust_ownership_3

I’ve spent time with Rust at var­i­ous points in the past, and being a lan­guage in devel­op­ment it was no sur­prise that every time I looked there were break­ing changes and even the doc­u­men­ta­tions look very dif­fer­ent at every turn!

Fast for­ward to May 2015 and it has now hit the 1.0 mile­stone so things are sta­ble and it’s now a good time to start look­ing into the lan­guage in earnest.

The web site is look­ing good, and there is an inter­ac­tive play­ground where you can try it out with­out installing Rust. Doc­u­men­ta­tion is beefed up and read­ily acces­si­ble through the web site. I per­son­ally find the Rust by Exam­ples use­ful to quickly get started.

 

Own­er­ship

The big idea that came out of Rust was the notion of “bor­rowed point­ers” though the doc­u­men­ta­tions don’t refer to that par­tic­u­lar term any­more. Instead, they talk more broadly about an own­er­ship sys­tem and hav­ing “zero-cost abstractions”.

Zero-cost what?

The abstrac­tions we’re talk­ing here are much lower level than what I’m used to. Here, we’re talk­ing about point­ers, poly­mor­phic func­tions, traits, type infer­ence, etc.

Its pointer sys­tem for exam­ple, gives you mem­ory safety with­out need­ing a garbage col­lec­tor and Rust point­ers com­piles to stan­dard C point­ers with­out addi­tional tag­ging or run­time checks.

It guar­an­tees mem­ory safety for your appli­ca­tion through the own­er­ship sys­tem which we’ll be div­ing into shortly. All the analy­sis are per­formed at com­pile time, hence incur­ring “zero-cost” at runtime.

Basics

Let’s get a cou­ple of basics out of the way first.

image

Note that in Rust, println is imple­mented as a macro, hence the bang (!).

Own­er­ship

When you bind a vari­able to some­thing in Rust, the bind­ing claims own­er­ship of the thing it’s bound to. E.g.

image

When v goes out of scope at the end of foo(), Rust will reclaim the mem­ory allo­cated for the vec­tor. This hap­pens deter­min­is­ti­cally, at the end of the scope.

When you pass v to a func­tion or assign it to another bind­ing then you have effec­tively moved the own­er­ship of the vec­tor to the new bind­ing. If you try to use v again after this point then you’ll get a com­pile time error.

image

image

This ensures there’s only one active bind­ing to any heap allo­cated mem­ory at a time and elim­i­nates data race.

There is a ‘data race’ when two or more point­ers access the same mem­ory loca­tion at the same time, where at least one of them is writ­ing, and the oper­a­tions are not synchronized.

Copy trait

Prim­i­tive types such as i32 (i.e. int32) are stack allo­cated and exempt from this restric­tion. They’re passed by value, so a copy is made when you pass it to a func­tion or assign it to another binding.

image

The com­piler knows to make a copy of n because i32 imple­ments the Copy trait (a trait is the equiv­a­lent to an inter­face in .Net/Java).

You can extend this behav­iour to your own types by imple­ment­ing the Copy trait:

image

Don’t worry about the syn­tax for now, the point here is to illus­trate the dif­fer­ence in behav­iour when deal­ing with a type that imple­ments the Copy trait.

The gen­eral rule of thumb is : if your type can imple­ment the Copy trait then it should.

But cloning is expen­sive and not always pos­si­ble.

Bor­row­ing

In the ear­lier example:

image

  • own­er­ship of the vec­tor has been moved to the bind­ing v in the scope of take();
  • at the end of take() Rust will reclaim the mem­ory allo­cated for the vector;
  • but it can’t, because we tried to use v in the outer scope after­wards, hence the error.

What if, we bor­row the resource instead of mov­ing its ownership?

A real world anal­ogy would be if I bought a book from you then it’s mine to shred or burn after I’m done with it; but if I bor­rowed it from you then I have to make sure I return it to you in pris­tine conditions.

rust_ownership_4

rust_ownership_5

In Rust, we do this by pass­ing a ref­er­ence as argument.

image

Ref­er­ences are also immutable by default.

image

But just as you can cre­ate muta­ble bind­ings, you can cre­ate muta­ble ref­er­ences with &mut.

image

There are a cou­ple of rules for borrowing:

1. the borrower’s scope must not out­last the owner

2. you can have one of the fol­low­ing, but not both:

2.1. zero or more ref­er­ences to a resource; or

2.2. exactly one muta­ble reference

Rule 1 makes sense since the owner needs to clean up the resource when it goes out of scope.

For a data race to exist we need to have:

a. two or more point­ers to the same resource

b. at least one is writing

c. the oper­a­tions are not synchronized

Since the own­er­ship sys­tem aims to elim­i­nate data races at com­pile time, there’s no need for run­time syn­chro­niza­tion, so con­di­tion c always holds.

When you have only read­ers (immutable ref­er­ences) then you can have as many as you want (rule 2.1) since con­di­tion b does not hold.

If you have writ­ers then you need to ensure that con­di­tion a does not hold – i.e. there is only one muta­ble ref­er­ence (rule 2.2).

There­fore, rule 2 ensure data races can­not exist.

rust_ownership_2

Here are some issues that bor­row­ing pre­vents.

Beyond Own­er­ship

There are lots of other things to like about Rust, there’s immutabil­ity by default, pat­tern match­ing, macros, etc.

Pat­tern Matching

image

Structs

image

Enums

image

Even from these basic exam­ples, you can see the influ­ence of func­tional pro­gram­ming. Espe­cially with immutabil­ity by default, which bodes well with Rust’s goal of com­bin­ing safety with speed.

Rust also has a good con­cur­rency story too (pretty much manda­tory for any mod­ern lan­guage) which has been dis­cussed in detail in this post.

Over­all I enjoy cod­ing in Rust, and the own­er­ship sys­tem is pretty mind open­ing too. With both Go and Rust com­ing of age and tar­get­ing a sim­i­lar space around sys­tem pro­gram­ming, it’ll be very inter­est­ing to watch this space develop.

 

Links

Erlang on Xen

Stum­bled across this slid­edeck today, it’s very infor­ma­tive and so I feel obliged to share!

InfoQ interview at BuildStuff 14

The video and inter­ac­tive tran­script is also avail­able on InfoQ’s page here.

Why I like Go’s interfaces

When I hear peo­ple talk about Go, a lot of the dis­cus­sions focus on its con­cur­rency fea­tures. Whilst it has a good con­cur­rency story, the lan­guage land­scape is cur­rently filled with lan­guages that have an equally good or bet­ter con­cur­rency story — F#, Erlang, Elixir, Clo­jure, etc…

Per­son­ally, what I found really inter­est­ing from my time with Go was how its inter­faces work. In short, inter­faces do not need to be explic­itly imple­mented — i.e. no imple­ment key­word. Instead, inter­faces are sat­is­fied implic­itly.

 

Duck Typ­ing

In dynamic lan­guages such as Python, you have the con­cept of Duck Typ­ing.

“if it looks like a duck and quacks like a duck, it’s a duck”

Sup­pose you have a say_quack  func­tion in Python which expects its argu­ment to have a quack  method. You can invoke the func­tion with any object so long it has the quack  method.

image

Duck typ­ing is con­ve­nient, but with­out a com­piler to catch your mis­takes you are trad­ing a lot of safety for convenience.

trade_off_1

 

What if there’s a way to get the best of both worlds?

In F#, this can be achieved through sta­t­i­cally resolved type para­me­ters:

image

But syn­tac­ti­cally, sta­t­i­cally resolved TP is kinda clunky and not the eas­i­est to read. Go’s inter­faces rep­re­sent a more ele­gant solu­tion in my view.

 

Implic­itly Imple­mented Interface

In Go, sup­pose you have an inter­face for a Duck:

image

Any struct that has a Quack  method will imple­ment the Duck  inter­face implic­itly and can be used as a Duck.

image

(try it your­self here)

If you have another struct, Dog, which doesn’t have a Quack  method and you tried to use it as a Duck  then you’ll get a com­pile time error:

image

(try it your­self here)

so there, the con­ve­nience of duck typ­ing with the safety of sta­tic checking!

trade_off_2

 

Beyond Con­ve­nience

The design for Go’s inter­face stems from the obser­va­tion that pat­terns and abstrac­tions only become appar­ent after we’ve seen it a few times.

So rather than lock­ing us in with abstrac­tions at the start of a project when we’re at the point of our great­est igno­rance, we can define these abstrac­tions as and when they become appar­ent to us.

When you cre­ate a new inter­face, you don’t have to go back and tag every imple­men­ta­tion, which some­times might not be pos­si­ble if the imple­men­ta­tion is owned by a 3rd party.

This makes Go inter­faces incred­i­bly cheap, and encour­ages you to cre­ate very gran­u­lar, pre­cise inter­face definitions.

 

All and all, even though I don’t enjoy writ­ing code in Go (as you tend to write imper­a­tive style of code), I think there are some very inter­est­ing ideas and lessons to take from the language.

It’s also a very rel­e­vant lan­guage of our time, with some impor­tant prod­ucts (ahem, Docker) hav­ing been writ­ten in Go.

It’s a very small lan­guage still, and its web­site does a good job in help­ing you get started. Take a tour of Go if you’re inter­ested in learn­ing more about the language.

 

Links