A consistent approach to track correlation IDs through microservices

You can become a serverless blackbelt. Enrol to my 4-week online workshop Production-Ready Serverless and gain hands-on experience building something from scratch using serverless technologies. At the end of the workshop, you should have a broader view of the challenges you will face as your serverless architecture matures and expands. You should also have a firm grasp on when serverless is a good fit for your system as well as common pitfalls you need to avoid. Sign up now and get 15% discount with the code yanprs15!

One of my key takeaways from Tammer Saleh’s microservices anti-patterns talk at CraftConf is that we need to record correlation IDs to help us debug microservices.


Why do we need Correlation IDs?

Suppose a user request comes in and after various aspects of the request has been handled by several services something goes wrong.

Even though every service would have been logging along the way it’s no easy task to find all the relevant log messages for this request amongst millions of log messages!


This is the problem that correlation IDs help us solve.

The idea is simple. When a user-facing service receives a request it’ll create a correlation ID, and:

  • pass it along in the HTTP header to every other service
  • include it in every log message

The correlation ID can then be used to quickly find all the relevant log messages for this request (very easy to do in Elasticsearch).


Additionally, you might wish to pass along other useful contextual information (such as payment ID, tournament ID, etc.) about the request to downstream services. Otherwise they would only appear in log messages from services where they originated from.

It’s helpful to have these contextual information in the error log message so that you don’t have to scan through all relevant messages (by correlation ID) to piece them together.


Whilst it’s a very simple pattern, it’s still one that has to be applied in every log message and every HTTP request and HTTP handler. That’s a lot of unwanted cognitive and development overhead.

In his talk,Tamer suggested that it’s just a fact of life that you have to live with, and that every team has to remember to implement this pattern in every service they create.

To me that would be setting my fellow developers up to fail – someone is bound to forget at one time or another and the whole thing falls down like a house of cards.

I wanted a more systematic approach for our team.


The Approach

When it comes to implementation patterns like this, my natural tendency is to automate them with PostSharp. However in this case, it doesn’t seem to be a suitable solution because we need to control too many different components:

  • HTTP handler to parse and capture correlation ID passed in via HTTP headers
  • logger to inject the captured correlation IDs
  • HTTP client to include captured correlation IDs as HTTP headers

Instead, this looks like a battle that needs to be fought on multiple fronts.


Fortunately we already have abstractions in the right places!


Our custom log4net appender can be augmented to look for any captured correlation IDs whenever it logs a message to Elasticsearch so they each appear as a separate, searchable field.

HTTP Client

We also have a custom HttpClient that wraps the BCL HttpClient so that:

  • on timeouts, the client throws a TimeoutException instead of the confusing TaskCanceledException
  • by default, there is a short timeout of 3 seconds
  • by default, the client allows 10 consecutive timeouts before tripping the circuit breaker for 30 seconds (via Polly)
  • enables HTTP caching (which by default, is disabled for some reason)
  • has built-in support for HEAD, OPTIONS and PATCH

This HttpClient is also augmented to look for any captured correlation IDs and send them along as  headers in every HTTP call.

Providing the Correlation IDs

For the logger and HTTP client, they need to have a consistent way to get the captured correlation IDs for the current request. That’s the job of the CorrelationIds class below.

The tricky thing is that depending on the hosting framework for your web application – we use both Nancy and Asp.Net WebApi – you might need a different way to track the correlation IDs, hence the ICorrelationIdsProvider abstraction.

Rather than forcing the consumer to call CorrelationIds.ProvidedBy, I could have tied in with an IoC container instead. I chose not to because I don’t want to be tied down by our current choice of IoC container.

HTTP Handler (Nancy)

Both Nancy and Asp.Net have mechanisms for injecting behaviour at various key points in the HTTP request processing pipeline.

In Nancy’s case, you can have a custom bootstrapper:

But we still need to implement the aforementioned ICorrelationIdsProvider interface.

Unfortunately, I can’t use HttpContext.Current in Nancy, so we need to find an alternative way to ensure the captured correlation IDs flow through async-await operations. For this I used a modified version of Stephen Cleary’s AsyncLocal to make sure they flow through the ExecutionContext.

A simplified version of the NancyCorrelationIdsProvider implementation looks like the following:

HTTP Handler (Asp.Net)

For Asp.Net WebApi projects, since we can rely on HttpContext.Current to do the dirty work, so the implementation for ICorrelationIdsProvider becomes a trivial exercise.

What wasn’t immediately obvious to me, was how to tap into the request processing pipeline in a way that can be reused. After some research, custom HTTP modules seem to be the way to go.

However, with custom HTTP modules, they still need to be registered in the web application in the web.config.


So, that’s it, what do you think of our approach?

This is the best I have come up with so far, and it has a number of obvious limitations:

  • usable only from .Net, our Erlang-based services would also need something similar
  • only works with web services, and doesn’t extend to queue workers (for Amazon SQS) and stream processors (for Amazon Kinesis)
  • still require initial wiring up (can be mitigated with scaffolding)

If you have some ideas for improvement I would very much like to hear it!



Liked this article? Support me on Patreon and get direct help from me via a private Slack channel or 1-2-1 mentoring.
Subscribe to my newsletter

Hi, I’m Yan. I’m an AWS Serverless Hero and I help companies go faster for less by adopting serverless technologies successfully.

Are you struggling with serverless or need guidance on best practices? Do you want someone to review your architecture and help you avoid costly mistakes down the line? Whatever the case, I’m here to help.

Hire me.

Skill up your serverless game with this hands-on workshop.

My 4-week Production-Ready Serverless online workshop is back!

This course takes you through building a production-ready serverless web application from testing, deployment, security, all the way through to observability. The motivation for this course is to give you hands-on experience building something with serverless technologies while giving you a broader view of the challenges you will face as the architecture matures and expands.

We will start at the basics and give you a firm introduction to Lambda and all the relevant concepts and service features (including the latest announcements in 2020). And then gradually ramping up and cover a wide array of topics such as API security, testing strategies, CI/CD, secret management, and operational best practices for monitoring and troubleshooting.

If you enrol now you can also get 15% OFF with the promo code “yanprs15”.

Enrol now and SAVE 15%.

Check out my new podcast Real-World Serverless where I talk with engineers who are building amazing things with serverless technologies and discuss the real-world use cases and challenges they face. If you’re interested in what people are actually doing with serverless and what it’s really like to be working with serverless day-to-day, then this is the podcast for you.

Check out my new course, Learn you some Lambda best practice for great good! In this course, you will learn best practices for working with AWS Lambda in terms of performance, cost, security, scalability, resilience and observability. We will also cover latest features from re:Invent 2019 such as Provisioned Concurrency and Lambda Destinations. Enrol now and start learning!

Check out my video course, Complete Guide to AWS Step Functions. In this course, we’ll cover everything you need to know to use AWS Step Functions service effectively. There is something for everyone from beginners to more advanced users looking for design patterns and best practices. Enrol now and start learning!