Storing subscriptions

TL;DR: With Rebus2, subscription storages can be configured to be “centralized”, which means that all endpoints are allowed to register themselves as subscribers directly in the subscription storage. This way, since the central subscription storage is shared between all endpoints, no additional routing information is needed in order to implement pub/sub.

Long version:

Rebus has always had a subscription storage, which was the persistence abstration that a publisher would use to remember who subscribed to what.

Basic workings of the subscription mechanism

In the most basic scenario, it works like this:

1: Subscriber locates publisher

A subscriber, some_subscriber wants to subscribe to all string events.

In order to do this, the subscriber asks The Router (I’ll talk some more about The Router in another blog post, I promise 🙂 ): “Router – who owns System.String, mscorlib?” to which The Router replies: “ some_publisher” (The Router is just concise like that…)

2: Subscriber subscribes

The subscriber now sends a SubscribeRequest to some_publisher, which is basically saying something like “hi, I’m some_subscriber – I’d like to subscribe to System.String, mscorlib.

3: Publisher remembers

Having received the SubscribeRequest, the publisher then saves this information to its subscription storage (e.g. a table in SQL Server) in the form of this simple tuple: ("System.String, mscorlib", "some_subscriber").

4: Publisher publishes

For all eternity (or until the subscriber unsubscribes again), the publisher will then publish its string events by checking its subscription storage for subscribers of System.String, mscorlib.

Let’s pretend that it gets these two bad boys:

  • ("System.String, mscorlib", "some_subscriber")
  • ("System.String, mscorlib", "another_subscriber")

With these two in hand, the publisher will then just go on and send the event to each subscriber directly, i.e. to some_subscriber and another_subscriber in this case.

To sum it up

The basic pub/sub sequence can be sketched as this, which more accurately could be called “sub/pub”, because that’s the order of events – first, steps 1, 2, and 3 (router lookup omitted for clarity)

subscribe

and then step 4

pub

A nifty trick with new Rebus

Check out the current layout of the ISubscriptionStorage interface:

In Rebus2 (i.e. versions 0.90.* and on), the subscription storage has had the ability to be “centralized” – i.e. an implementation of Rebus’ ISubscriptionStorage could return true from the IsCentralized property, which will cause Rebus to bypass the publisher lookup in the router and the sending of the SubscribeRequest, thereby shortcutting the subscription process a great deal.

Which makes perfect sense, because it doesn’t matter who makes the ("System.String, mscorlib", "some_subscriber") registration in the subscription storage, if only the publisher gets it when it’s time to publish some messages. Therefore – if a subscription storage is configured to be centralized, the subscription process is as short as this:

1: Subscriber subscribes

The subscriber saves its subscription to the subscription storage (e.g. a table in SQL Server) in the form of this simple tuple: ("System.String, mscorlib", "some_subscriber").

and then we would already be ready for

4: Publisher publishes

(which I will not repeat here)

To sum it up

When the subscription storage is centralized, the subscription process can be sketched like this:

subscribe2

When to use what?

The distributed way

Pros: The basic IsCentralized = false way is the most distributed form of publish/subscribe, since all subscribers rely on knowing the addresses of their dependent publishers only, and need not worry about having a connection to a subscription database somewhere.

With the non-centralized solution, each publisher can have its own subscription storage, which can e.g. be a local SQL Server Express, a JSON file, or whatever each publisher feels like. This makes the non-centralized way more resilient to failures, because there’s no central database required for publishers to work.

Cons: With the non-centralized solution, all subscribers must have an endpoint mapping in their routers, mapping one single publisher as the owner of each event type. More discipline is required during development, and there’s simply more moving parts at runtime.

The centralized way

Pros: The new IsCentralized = true way is easier to get started with, and it may be easier to grasp for newcomers. It also alleviates all subscribers of the burden of having to map publishers as owners of all the different types of events.

Cons: All publishers and subscribers are required to have the centralized subscription storage configured, and thus everyone needs to be able to connect to the same central database.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.