I spent the last cou­ple of nights and put together a small BERT seri­al­izer for .Net called Fil­bert.

 

What’s BERT?

BERT (Binary ERlang Term) is a binary for­mat based on Erlang’s binary seri­al­iza­tion for­mat (as used by erlang:term_to_binary/1) but sup­ports a cou­ple of com­plex types such as boolean, dic­tio­nary and time, in addi­tional to the prim­i­tive types.

BERT and BERT-RPC was spec­i­fied by GitHub’s cofound Tom Preston-Werner and bas been in pro­duc­tion use at GitHub as part of their infra­struc­ture allow­ing them to inte­grate Ruby and Erlang through the Ernie BERT-RPC server.

The encod­ing for­mat for BERT is the same as Erlang’s exter­nal term for­mat which you can read about in great detail here. It is not as highly opti­mized as some­thing like pro­to­col buffer but is very easy to under­stand and imple­ment and does not require a sep­a­rate con­tract def­i­n­i­tion file (the .proto file for pro­to­col buffer) in order to be able to seri­al­ize a type.

 

How can I try it?

I have yet to put together a Nuget pack­age for Fil­bert but will do so in the near future, in the mean time why not fork the project on GitHub and try it out for yourself?

I have included a sim­ple F# and C# exam­ple project as part of the solu­tion, and if it’s not clear then have a read of the tuto­r­ial page to help you get started.

 

I have ensured rea­son­able test cov­er­age for both encoder and decoder but there are no doubt many edge cases which I haven’t con­sid­ered and would really appre­ci­ate any feed­back you have on how best I can improve the solution Smile

Share

Update 2012/08/23: Thanks for the sug­ges­tion from Jizugu in the com­ments, I’ve updated the post to show you his approach to call­ing the explicit oper­a­tor in a clean and ele­gant way.

 

In C#, you can define an explicit oper­a­tor for your type using the explicit keyword:

image

You can define an explicit oper­a­tor like the below and use a cus­tom oper­a­tor to make invok­ing the explicit oper­a­tor in an ele­gant way rather than hav­ing to call the sta­tic Person.op_Explicit method:

Share

You can spec­ify a func­tion which can take in a numeric value with a generic unit of mea­sure eas­ily enough:

image

Sim­i­larly, you can also spec­ify a dis­crim­i­nated union whose clauses can be of a numeric value with a generic unit of mea­sure, like this:

Share

Pecu­liarly I couldn’t find any doc­u­mented way to cre­ate a type exten­sion for a generic array, ‘a [ ], turns out you need to use back­tick marks ( ‘ ) around the square brack­ets in order to do that:

Share

Note: Don’t for­get to check out Bench­marks page to see the lat­est round up of binary and JSON serializers.

Fol­low­ing on from my pre­vi­ous test, I have now included JsonFx and as well as the Json.Net BSON seri­al­izer in the mix to see how they match up.

The results (in mil­lisec­onds) as well as the aver­age pay­load size (for each of the 100K objects seri­al­ized) are as follows.

image[4]

Graph­i­cally this is how they look:

image

I have included protobuf-net in this test to pro­vide more mean­ing­ful com­par­i­son for Json.Net BSON seri­al­izer since it gen­er­ates a binary pay­load and as such has a dif­fer­ent use case to the other JSON serializers.

In gen­eral, I con­sider JSON to be appro­pri­ate when the seri­al­ized data needs to be human read­able, a binary pay­load on the other hand, is more appro­pri­ate for com­mu­ni­ca­tion between applications/services.

Obser­va­tions

You can see from the results above that the Json.Net BSON seri­al­izer actu­ally gen­er­ates a big­ger pay­load than its JSON coun­ter­part. This is because the sim­ple POCO being seri­al­ized con­tains an array of 10 inte­gers in the range of 1 to 100. When the inte­ger ‘1’ is seri­al­ized as JSON, it’ll take 1 byte to rep­re­sent as one char­ac­ter, but an inte­ger will always take 4 bytes to rep­re­sent as binary!

In com­par­i­son, the pro­to­col buffer for­mat uses varint encod­ing so that smaller num­bers take a smaller num­ber of bytes to rep­re­sent, and it is not self-describing (the prop­erty names are not part of the pay­load) so it’s able to gen­er­ate a much much smaller pay­load com­pared to JSON and BSON.

Lastly, whilst the Json.Net BSON seri­al­izer offers a slightly faster dese­ri­al­iza­tion time com­pared to the Json.Net JSON seri­al­izer, it does how­ever, have a much slower seri­al­iza­tion speed.

Dis­claimers

Bench­marks do not tell the whole story, and the num­bers will nat­u­rally vary depend­ing on a num­ber of fac­tors such as the type of data being tested on. In the real world, you will also need to take into account how you’re likely to inter­act with the data, e.g. if you know you’ll be dese­ri­al­iz­ing data a lot more often than seri­al­iz­ing them then dese­ri­al­iza­tion speed will of course become less impor­tant than seri­al­iza­tion speed!

In the case of BSON and inte­gers, whilst it’s less effi­cient (than JSON) when seri­al­iz­ing small num­bers, it’s more effi­cient when the num­bers are big­ger than 4 digits.

Share