I recently co-authored an arti­cle with Gael Frai­teur (cre­ator of Post­Sharp) on how AOP frame­works such as Post­Sharp can be used to auto­mate the imple­men­ta­tion of com­mon design pat­terns in .Net.

The arti­cle is now avail­able to view on the InfoQ web­site here. A more offline friendly PDF ver­sion is also avail­able here and from Post­Sharp’s brand new site.

 

Hope you enjoy what we have to say on the topic, and please don’t hes­i­tate to give us your feedbacks!

Share

Whilst search­ing for an ele­gant solu­tion to apply string intern­ing across a large num­ber of classes (we’re talk­ing about hun­dreds of classes here..) it dawned on me that I can achieve this with ease using PostSharp’s Loca­tion­In­ter­cep­tionAspect. All I needed was some­thing along the lines of:

You can apply this attribute to a class or even a whole assem­bly and it’ll ensure every piece of string con­structed is interned, includ­ing string prop­er­ties and fields defined by its sub­class, which is exactly what I was after.

For exam­ple, take this triv­ial piece of code:

image

If you inspect the com­piled code for the Base class in ILSpy you will see some­thing along the lines of:

image

notice how the set­ter for BaseS­tring­Prop­erty has been mod­i­fied to invoke the OnSet­Value method defined in our aspect above as opposed to the set­ter method. In this case, it’ll call the String.Intern method to retrieve a ref­er­ence to an interned instance of the string and set the prop­erty to that reference.

For more details on PostSharp’s inter­cep­tion aspects, I rec­om­mend read­ing Dustin Davis’s excel­lent posts on the topic:

Post­Sharp Prin­ci­ples: Day 7 Inter­cep­tion Aspects – Part 1

Post­Sharp Prin­ci­ples: Day 8 Inter­cep­tion Aspects – Part 2

 

As we’ve spec­i­fied the mul­ti­cast inher­i­tance behav­iour to mul­ti­cast the attribute to mem­bers of the chil­dren of the orig­i­nal ele­ment, the string prop­er­ties defined in both A and B classes are also sub­ject to the same string intern­ing treat­ment with­out us hav­ing to explic­itly apply the Inter­nAt­tribute on them:

image

 

F# Com­pat­i­ble

What’s more, this attribute also works with F# types too, includ­ing record and dis­crim­i­nated unions types. Take for instance:

image

If you look at the gen­er­ated C# code for the dis­crim­i­nated union type, the inter­nal MyDuType.CaseB type would look some­thing like the following:

image

notice how the two inter­nal item1 and item2 properties’s set­ter meth­ods have been mod­i­fied in much the same way as the C# exam­ples above? The pub­lic Item1 and Item2 prop­er­ties are read-only and get their val­ues from the inter­nal prop­er­ties instead.

Indeed, when a new instance of the CaseB type is con­structed, it is the inter­nal prop­er­ties whose val­ues are initialized:

image

 

Finally, let’s look at the record type, which inter­est­ingly also defines a non-string field:

image

because we have spec­i­fied that the Inter­nAt­tribute should only be applied to prop­er­ties or fields of type string (via the Com­pile­TimeVal­i­date method which is exe­cuted as part of the post-compilation weav­ing process as opposed to run­time), so the inter­nal rep­re­sen­ta­tion of the Age field is left unaltered.

The Name field, how­ever, being of string type, was sub­ject to the same trans­for­ma­tion as all our other examples.

 

I hope this lit­tle attribute can prove to be use­ful to you too, it has cer­tainly saved me from an unbear­able amount of grunt work!

Share

NOTE: if you’re unfa­mil­iar with how Post­Sharp works under the hood, I highly rec­om­mend that you check out Dustin Davis’ excel­lent Post­Sharp Prin­ci­ples series of blog posts here.

The Prob­lem

The new async/await key­words in C# are pretty awe­some, and makes life an awful lot eas­ier when writ­ing asyn­chro­nous and non-blocking IO code. How­ever, for those of us who are using frame­works such as Post­Sharp to deal with cross-cutting con­cerns we now face a new chal­lenge – the aspects which we have come to rely upon no longer works the way we expect them to when applied on async meth­ods (which returns void, Task or Task<T>), as can be seen from the exam­ples below:

So what’s wrong here?

If you take a look at the code from the above exam­ple using a decom­piler such as Jet­Brain’s Dot­Peek, you’ll see that the nor­mal syn­chro­nous ver­sion of Foo looks some­thing along the line of:

image

As you can see, the weaved code include calls to the OnEn­try, OnSuc­cess and OnEx­cep­tion meth­ods pro­vided by the OnMethod­Bound­aryAspect class, so every­thing is as expected here.

For FooA­sync how­ever, the pic­ture is a lit­tle more complicated:

image

Turns out the C# com­piler rewrites async meth­ods into a state machine which means that although the OnSuc­cess and OnEx­cep­tion hooks are still in place, they’re not telling us when the body of the method suc­ceeds or fails but instead, when the state machine cre­ation has suc­ceeded or failed!

image

image

Pretty big bum­mer, eh?

Pro­posed Solution

One way (and the best way I can think of for now) to get around this is to have a spe­cial aspect which works with meth­ods that return Task or Task<T> and hook up con­tin­u­a­tions to be exe­cuted after the returned tasks had fin­ished. Some­thing sim­i­lar to the below will do for the on method bound­ary aspect:

And then you can cre­ate a TraceA­sync attribute that works for async methods:

As you can see from the out­put above, our new OnTask­Fin­ished, OnTask­Faulted and OnTaskCom­ple­tion hooks are cor­rectly exe­cuted after the task returned by the async method had fin­ished, faulted due to excep­tion or ran to completion!

The same approach can also be applied to other built-in aspects such as the Method­In­ter­cep­tionAspect class.

Before you go…

How­ever, there are a two things you should con­sider first before jump­ing into the workaround pro­posed above.

1. if you look at the out­put from the pre­vi­ous exam­ple care­fully, you’ll see that the line “FooA­sync fin­ished” came AFTEREnter­ing Boo” even though from the test code we have awaited the com­ple­tion of FooA­sync before call­ing Boo. This is because the con­tin­u­a­tions are exe­cuted asynchronously.

If this behav­iour is not desir­able to you, there is a very sim­ple fix. Back in the OnA­syncMethod­Bound­Aspect class we defined above, sim­ply add TaskContinuationOptions.ExecuteSynchronously to each of the continuations:

image

2. the pro­posed solu­tion still wouldn’t work with async meth­ods that return void sim­ply because there are no returned Task/Task<T> objects to hook up con­tin­u­a­tions with. In gen­eral though, you should avoid hav­ing async void meth­ods as much as pos­si­ble because they intro­duce some pit­falls which you really wouldn’t want to find your­self in! I’ve dis­cussed the prob­lem with aysnc void (and some poten­tial workarounds) in a pre­vi­ous post here.

 

I hope this post proves use­ful to you, and happy PostSharp’ng! I hear some big things are com­ing in this space Winking smile

Share

Hello!

Just a quick note to men­tion that I will be speak­ing about Aspect Ori­ented Pro­gram­ming at next Saturday’s DDD10 in Read­ing, some great ses­sions in the line up this year, hope to see you there Smile

Share

After watch­ing Gael’s recent Skills­Mat­ter talk on mul­ti­thread­ing I’ve put together some notes from a very edu­ca­tional talk:

 

Hard­ware Cache Hierarchy

image

Four lev­els of cache

  • L1 (per core) – typ­i­cally used for instructions
  • L2 (per core)
  • L3 (per die)
  • DRAM (all processors)

Data can be cached in mul­ti­ple caches, and syn­chro­niza­tion hap­pens through an asyn­chro­nous mes­sage bus.

The latency increases as you go down the dif­fer­ent lev­els of cache:

image 

 

Mem­ory Reordering

Cache oper­a­tions are in gen­eral opti­mized for per­for­mance as opposed to log­i­cal behav­iour, hence depend­ing on the archi­tec­ture (x86, AMD, ARM7, etc.) cache loads and store oper­a­tions can be reordered and exe­cuted out-of-order:

image

To add to this mem­ory reorder­ing behav­iour at a hard­ware level, the CLR can also:

  • cache data into register
  • reorder
  • coa­lesce writes

The volatile key­word stops the com­piler opti­miza­tions, that’s all, it does not stop the hard­ware level optimizations.

This is where mem­ory bar­rier comes in, to ensure ser­ial access to mem­ory and to force data to be flushed and syn­chro­nized across all the local cache, this is done via the Thread.MemoryBarrier method in .Net.

 

Atom­ic­ity

Oper­a­tions on longs can­not be per­formed in an atomic way on a 32-bit archi­tec­ture, it’s pos­si­ble to get par­tially mod­i­fied value.

 

Inter­locked

Inter­locks pro­vides the only lock­ing mech­a­nism at hard­ware level, the .Net frame­work pro­vides access to these instruc­tions via the Inter­locked class.

On the Intel archi­tec­ture, inter­locks are typ­i­cally imple­mented on the L3 cache, a fact that’s reflected by the latency asso­ci­ated with using Inter­locked incre­ments com­pared with non-interlocked:

image

Com­pa­re­Ex­change is the most impor­tant tool when it comes to imple­mented lock-free algo­rithms, but since it’s imple­mented on the L3 cache, in a multi-processor envi­ron­ment it would require one of the proces­sor to take out a global lock, hence why the con­tented case above takes much longer.

You can analyse the per­for­mance of your appli­ca­tion at a CPU level using Intel’s vTune Ampli­fier XE tool.

 

Mul­ti­task­ing

Threads do not exist at a hard­ware level, CPU only under­stands tasks and it has no con­cept of ‘wait’. Syn­chro­niza­tion con­structs such as sem­a­phores and mutex are built on top of inter­locked operations.

One core can never do more than 1 ‘thing’ at the same time, unless it’s hyper-threaded in which case the core can do some­thing else whilst wait­ing on some resource to con­tinue exe­cut­ing the orig­i­nal task.

A task runs until inter­rupted by hard­ware (I/O inter­rupt) or OS.

 

Win­dows Kernel

A process has:

  • pri­vate vir­tual address space
  • resources
  • at least 1 thread

A thread is:

  • a pro­gram (sequence of instructions)
  • CPU state
  • wait depen­den­cies

Threads can wait for dis­patcher objects (Wait­Handle) – Mutex, Sem­a­phore, Event, Timer or another thread, when they’re not wait­ing for any­thing they’re placed in the wait­ing queue by the thread sched­uler until it is their turn to be exe­cuted on the CPU.

After a thread has been exe­cuted for some time, it is then moved back to the wait­ing queue (via a ker­nel inter­rupt) to give some other thread a slice of the avail­able CPU time. Alter­na­tively, if the thread needs to wait for a dis­patcher object then it goes back to the wait­ing state.

image

Dis­patcher objects reside in the ker­nel and can be shared among dif­fer­ent processes, they’re very expen­sive!

image

Which is why you don’t want to use ker­nel objects for waits that are typ­i­cally very short, instead they’re best used when wait­ing for some­thing that takes longer to return, e.g. I/O.

Com­pared to other wait meth­ods (e.g. Thread.Sleep, Thread.Yield, WaitHandle.Wait, etc.) Thread.SpinWait is an odd ball because it’s not a ker­nel method, it resem­bles a con­tin­u­ous loop (it keeps ‘spin­ning’) but it tells a hyper-threaded CPU that it’s ok to do some­thing else. It’s gen­er­ally use­ful when you know the inter­rupt will hap­pen very quickly and hence sav­ing you from an unnec­es­sary con­text switch. If the inter­rupt does not hap­pen quickly as expected, the Spin­Wait will be trans­formed into a nor­mal thread wait (Thread.Sleep) to avoid wast­ing CPU cycles.

 

.Net Frame­work Thread Synchronization

image

 

The lock Keyword

  1. start with inter­locked oper­a­tions (no contention)
  2. con­tinue with ‘spin wait’
  3. cre­ate ker­nel event and wait

Good per­for­mance if low contention.

 

Design Pat­terns

  • Thread unsafe
  • Actor
  • Reader-Writer Syn­chro­nized

This is where the Post­Sharp mul­ti­thread­ing toolkit comes to the res­cue! It can help you imple­ment each of these pat­terns auto­mat­i­cally, Gael has talked more about the toolkit in this blog post.

Share