TL;DR: With Rebus2, subscription storages can be configured to be “centralized”, which means that all endpoints are allowed to register themselves as subscribers directly in the subscription storage. This way, since the central subscription storage is shared between all endpoints, no additional routing information is needed in order to implement pub/sub.
Long version:
Rebus has always had a subscription storage, which was the persistence abstration that a publisher would use to remember who subscribed to what.
Basic workings of the subscription mechanism
In the most basic scenario, it works like this:
1: Subscriber locates publisher
A subscriber, some_subscriber wants to subscribe to all string events.
In order to do this, the subscriber asks The Router (I’ll talk some more about The Router in another blog post, I promise 🙂 ): “Router – who owns System.String, mscorlib?” to which The Router replies: “ some_publisher” (The Router is just concise like that…)
2: Subscriber subscribes
The subscriber now sends a SubscribeRequest to some_publisher, which is basically saying something like “hi, I’m some_subscriber – I’d like to subscribe to System.String, mscorlib.
3: Publisher remembers
Having received the SubscribeRequest, the publisher then saves this information to its subscription storage (e.g. a table in SQL Server) in the form of this simple tuple: ("System.String, mscorlib", "some_subscriber").
4: Publisher publishes
For all eternity (or until the subscriber unsubscribes again), the publisher will then publish its string events by checking its subscription storage for subscribers of System.String, mscorlib.
Let’s pretend that it gets these two bad boys:
- ("System.String, mscorlib", "some_subscriber")
- ("System.String, mscorlib", "another_subscriber")
With these two in hand, the publisher will then just go on and send the event to each subscriber directly, i.e. to some_subscriber and another_subscriber in this case.
To sum it up
The basic pub/sub sequence can be sketched as this, which more accurately could be called “sub/pub”, because that’s the order of events – first, steps 1, 2, and 3 (router lookup omitted for clarity)
and then step 4
A nifty trick with new Rebus
Check out the current layout of the ISubscriptionStorage interface:
1 2 3 4 5 6 7 8 9 10 |
public interface ISubscriptionStorage { Task<string[]> GetSubscriberAddresses(string topic); Task RegisterSubscriber(string topic, string subscriberAddress); Task UnregisterSubscriber(string topic, string subscriberAddress); bool IsCentralized { get; } } |
In Rebus2 (i.e. versions 0.90.* and on), the subscription storage has had the ability to be “centralized” – i.e. an implementation of Rebus’ ISubscriptionStorage could return true from the IsCentralized property, which will cause Rebus to bypass the publisher lookup in the router and the sending of the SubscribeRequest, thereby shortcutting the subscription process a great deal.
Which makes perfect sense, because it doesn’t matter who makes the ("System.String, mscorlib", "some_subscriber") registration in the subscription storage, if only the publisher gets it when it’s time to publish some messages. Therefore – if a subscription storage is configured to be centralized, the subscription process is as short as this:
1: Subscriber subscribes
The subscriber saves its subscription to the subscription storage (e.g. a table in SQL Server) in the form of this simple tuple: ("System.String, mscorlib", "some_subscriber").
and then we would already be ready for
4: Publisher publishes
(which I will not repeat here)
To sum it up
When the subscription storage is centralized, the subscription process can be sketched like this:
When to use what?
The distributed way
Pros: The basic IsCentralized = false way is the most distributed form of publish/subscribe, since all subscribers rely on knowing the addresses of their dependent publishers only, and need not worry about having a connection to a subscription database somewhere.
With the non-centralized solution, each publisher can have its own subscription storage, which can e.g. be a local SQL Server Express, a JSON file, or whatever each publisher feels like. This makes the non-centralized way more resilient to failures, because there’s no central database required for publishers to work.
Cons: With the non-centralized solution, all subscribers must have an endpoint mapping in their routers, mapping one single publisher as the owner of each event type. More discipline is required during development, and there’s simply more moving parts at runtime.
The centralized way
Pros: The new IsCentralized = true way is easier to get started with, and it may be easier to grasp for newcomers. It also alleviates all subscribers of the burden of having to map publishers as owners of all the different types of events.
Cons: All publishers and subscribers are required to have the centralized subscription storage configured, and thus everyone needs to be able to connect to the same central database.