A consistent approach to track correlation IDs through microservices

One of my key take­aways from Tam­mer Saleh’s microser­vices anti-pat­terns talk at Craft­Conf is that we need to record cor­re­la­tion IDs to help us debug microser­vices.


Why do we need Correlation IDs?

Sup­pose a user request comes in and after var­i­ous aspects of the request has been han­dled by sev­er­al ser­vices some­thing goes wrong.

Even though every ser­vice would have been log­ging along the way it’s no easy task to find all the rel­e­vant log mes­sages for this request amongst mil­lions of log mes­sages!


This is the prob­lem that cor­re­la­tion IDs help us solve.

The idea is sim­ple. When a user-fac­ing ser­vice receives a request it’ll cre­ate a cor­re­la­tion ID, and:

  • pass it along in the HTTP head­er to every oth­er ser­vice
  • include it in every log mes­sage

The cor­re­la­tion ID can then be used to quick­ly find all the rel­e­vant log mes­sages for this request (very easy to do in Elas­tic­search).


Addi­tion­al­ly, you might wish to pass along oth­er use­ful con­tex­tu­al infor­ma­tion (such as pay­ment ID, tour­na­ment ID, etc.) about the request to down­stream ser­vices. Oth­er­wise they would only appear in log mes­sages from ser­vices where they orig­i­nat­ed from.

It’s help­ful to have these con­tex­tu­al infor­ma­tion in the error log mes­sage so that you don’t have to scan through all rel­e­vant mes­sages (by cor­re­la­tion ID) to piece them togeth­er.


Whilst it’s a very sim­ple pat­tern, it’s still one that has to be applied in every log mes­sage and every HTTP request and HTTP han­dler. That’s a lot of unwant­ed cog­ni­tive and devel­op­ment over­head.

In his talk,Tamer sug­gest­ed that it’s just a fact of life that you have to live with, and that every team has to remem­ber to imple­ment this pat­tern in every ser­vice they cre­ate.

To me that would be set­ting my fel­low devel­op­ers up to fail – some­one is bound to for­get at one time or anoth­er and the whole thing falls down like a house of cards.

I want­ed a more sys­tem­at­ic approach for our team.


The Approach

When it comes to imple­men­ta­tion pat­terns like this, my nat­ur­al ten­den­cy is to auto­mate them with Post­Sharp. How­ev­er in this case, it doesn’t seem to be a suit­able solu­tion because we need to con­trol too many dif­fer­ent com­po­nents:

  • HTTP han­dler to parse and cap­ture cor­re­la­tion ID passed in via HTTP head­ers
  • log­ger to inject the cap­tured cor­re­la­tion IDs
  • HTTP client to include cap­tured cor­re­la­tion IDs as HTTP head­ers

Instead, this looks like a bat­tle that needs to be fought on mul­ti­ple fronts.


For­tu­nate­ly we already have abstrac­tions in the right places!


Our cus­tom log4net appen­der can be aug­ment­ed to look for any cap­tured cor­re­la­tion IDs when­ev­er it logs a mes­sage to Elas­tic­search so they each appear as a sep­a­rate, search­able field.

HTTP Client

We also have a cus­tom Http­Client that wraps the BCL Http­Client so that:

  • on time­outs, the client throws a Time­ou­tEx­cep­tion instead of the con­fus­ing TaskCanceledEx­cep­tion
  • by default, there is a short time­out of 3 sec­onds
  • by default, the client allows 10 con­sec­u­tive time­outs before trip­ping the cir­cuit break­er for 30 sec­onds (via Pol­ly)
  • enables HTTP caching (which by default, is dis­abled for some rea­son)
  • has built-in sup­port for HEAD, OPTIONS and PATCH

This Http­Client is also aug­ment­ed to look for any cap­tured cor­re­la­tion IDs and send them along as  head­ers in every HTTP call.

Providing the Correlation IDs

For the log­ger and HTTP client, they need to have a con­sis­tent way to get the cap­tured cor­re­la­tion IDs for the cur­rent request. That’s the job of the Cor­re­la­tion­Ids class below.

The tricky thing is that depend­ing on the host­ing frame­work for your web appli­ca­tion – we use both Nan­cy and Asp.Net WebApi – you might need a dif­fer­ent way to track the cor­re­la­tion IDs, hence the ICor­re­la­tion­Id­sProvider abstrac­tion.

Rather than forc­ing the con­sumer to call CorrelationIds.ProvidedBy, I could have tied in with an IoC con­tain­er instead. I chose not to because I don’t want to be tied down by our cur­rent choice of IoC con­tain­er.

HTTP Handler (Nancy)

Both Nan­cy and Asp.Net have mech­a­nisms for inject­ing behav­iour at var­i­ous key points in the HTTP request pro­cess­ing pipeline.

In Nancy’s case, you can have a cus­tom boot­strap­per:

But we still need to imple­ment the afore­men­tioned ICor­re­la­tion­Id­sProvider inter­face.

Unfor­tu­nate­ly, I can’t use HttpContext.Current in Nan­cy, so we need to find an alter­na­tive way to ensure the cap­tured cor­re­la­tion IDs flow through async-await oper­a­tions. For this I used a mod­i­fied ver­sion of Stephen Cleary’s Asyn­cLo­cal to make sure they flow through the Exe­cu­tion­Con­text.

A sim­pli­fied ver­sion of the Nan­cy­Cor­re­la­tion­Id­sProvider imple­men­ta­tion looks like the fol­low­ing:

HTTP Handler (Asp.Net)

For Asp.Net WebApi projects, since we can rely on HttpContext.Current to do the dirty work, so the imple­men­ta­tion for ICor­re­la­tion­Id­sProvider becomes a triv­ial exer­cise.

What wasn’t imme­di­ate­ly obvi­ous to me, was how to tap into the request pro­cess­ing pipeline in a way that can be reused. After some research, cus­tom HTTP mod­ules seem to be the way to go.

How­ev­er, with cus­tom HTTP mod­ules, they still need to be reg­is­tered in the web appli­ca­tion in the web.config.


So, that’s it, what do you think of our approach?

This is the best I have come up with so far, and it has a num­ber of obvi­ous lim­i­ta­tions:

  • usable only from .Net, our Erlang-based ser­vices would also need some­thing sim­i­lar
  • only works with web ser­vices, and doesn’t extend to queue work­ers (for Ama­zon SQS) and stream proces­sors (for Ama­zon Kine­sis)
  • still require ini­tial wiring up (can be mit­i­gat­ed with scaf­fold­ing)

If you have some ideas for improve­ment I would very much like to hear it!