Bye bye schema coupling, hello semantic coupling

Yan Cui

I help clients go faster for less using serverless technologies.

I recently shared six event versioning strategies for event-driven architectures [1]. In response to the article, Marty Pitt reached out and showed me how Orbital [2] and Taxi [3] use semantic tags to eliminate schema coupling in event-driven architectures and simplify the schema management.

It’s a novel way to manage schema evolution, and I want to share what I learnt with you.

Problems with Schema Coupling

In an event-driven architecture, event consumers are typically coupled to the schema of the event payload, as it serves as the contract between the event publisher, the event bus, and the event consumers.

This is a form of schema coupling.

When you change the schema of the event (including the data type of its fields), the consumer must change accordingly.

Therefore, you need to carefully manage the evolution of the event schema to avoid breaking existing consumers. Hence, the need for versioning or to prevent breaking changes at all [4].

Semantic Coupling

What if consumers are coupled to the meaning of the data (i.e. the semantics) rather than its representation?

In the example below, the two event versions have different schemas. However, both “customerId” and “customer.id” refer to the same concept – a customer ID.

In Orbital, consumers subscribe to and query these semantic tags, rather than the event payload. When Orbital (the event gateway) delivers data to the consumers, it delivers them as semantic tags, not the raw events.

As you can see in the example above, as you evolve the event schema, you update the mapping accordingly. Consumers are unaware of the schema change because it is hidden from them.

But what if you remove a field or change its data type?

That’s where “semantic functions” come in. It’s a way to transform or enrich the raw event data.

If the event no longer carries the customer’s name, then a semantic function can call out to an HTTP API and fetch the data from it.

However, it would be a waste to do this for every event if no consumers need the customer’s name. So, semantic functions are only run on fields when a consumer has requested the field. This works similarly to GraphQL resolvers, which are lazily evaluated based on the user query.

Similarly, a semantic function can also be used to perform data transformation. For example, to split “full-name” into “first-name” and “last-name”, or to convert “created-at” from DateTime to a string.

Finally, Orbital sits between data sources and consumers, and can work in both request-response and event-driven contexts.

The closest comparison in AWS is perhaps a combination of EventBridge and EventBridge Pipes.

Conclusion

Putting Orbital aside, I really like this approach of using semantic tags to decouple event consumers from the data representation.

Subscribing by meaning instead of by schema turns every change into a local mapping update. It eliminates the need for event versioning and prevents breaking changes at the same time.

I love the approach and its simplicity and I hope to see other tools take note of this approach and for it to become more widely adopted!

Links

[1] Event versioning strategies for event-driven architectures

[2] Orbital, a data integration platform

[3] Taxi, a language for describing how your data and services should connect together

[4] How to detect and prevent breaking changes in event schemas

Related Posts

Whenever you’re ready, here are 3 ways I can help you:

  1. Production-Ready Serverless: Join 20+ AWS Heroes & Community Builders and 1000+ other students in levelling up your serverless game. This is your one-stop shop for quickly levelling up your serverless skills.
  2. I help clients launch product ideas, improve their development processes and upskill their teams. If you’d like to work together, then let’s get in touch.
  3. Join my community on Discord, ask questions, and join the discussion on all things AWS and Serverless.