Can your system withstand the process chaos monkey?

As part of my efforts to constantly improve and harden Rebus, I have recently put it through some testing that should reveal any weaknesses around how it handles transactions in the face of failure.

My most recent test involved a client, that would send a number of sequentially numbered messages to a FragileProcessor, that would process these messages and save the contained number in an SQL Server – and in doing this, FragileProcessor would constantly crash and burn!

To help me do this, I devised a simple application called ProcessChaosMonkey, containing this evil beast (whose name is inspired – a.k.a. shamelessly ripped off – of Netflix’s chaos monkey):

which I would then run like this: ProcessChaosMonkey.exe FragileProcessor, causing the FragileProcessor process to be constantly shot down at random times.

As FragileProcessor is a Windows Service, I configured it to recover immediately upon crashing. This way, I could leave “my system” running, processing thousands and thousands of messages, and then I could come back and see if messages had been dropped. This way of planning for failure is in my opinion the only way to go if you want to build systems, that can’t fail. Do you think your system can withstand ProcessChaosMonkey?

Oh, and if you’re interested in how the most recent testing went, stay tuned for the next post where I’ll talk a little bit about delivery guarantees…

Developing .NET applications very rapidly

In my current position as an agile .NET dude at d60, I have already found myself in need of rapidly building several small applications and services from scratch.

This way of working reminds me a lot about some of the things that Dan North talked about in his “From months to minutes – upping the stakes” talk at GOTO Aarhus 2010.

In the talk, Dan described how he was part of a team that would rapidly prototype and deliver working systems to some stock exchange or something, and in doing this they would often start from scratch and replace the existing applications, instead of adapting old ones. The advantage is that you get to take some shortcuts when you’re making fairly short-lived applications, you don’t have to carry the burden of old legacy crap, you get to re-do, possibly undo(?) almost every decision some time, and you become really good at building stuff from scratch!

So, this is cool, and I think he said a lot of things that really make sense. But how can you build stuff very rapidly when you’re a .NET developer in a Microsoft shop, and your customer has based everything on Microsoft technology? You would think that a .NET developer would be slow compared to the hip coders working with Node.js, Clojure and [insert hip new technology here]…?

Well, it just so happens that .NET developers are actually pretty lucky in that regard. In this post, I’d just like to go through a few of the things that enable us to deliver stuff really quickly, both .NET and not-necessarily-.NET-things.

Git and GitHub

Like all sensible coding teams, we have moved to distributed source control. We have some pretty skilled Git-people among our developers, so Git was the natural choice. And since we’re not really interested in maintaining our own servers, it makes the most sense to have some hosted Git repositories somewhere – and for that, we got a company account at GitHub, because it’s a great place to host Git repositories.

Git’s branching model is just so extremely fast and lightweight that I’m wondering how I ever managed to get anything done in Subversion. Well, I guess I just didn’t update & commit that often then.

Is anyone still using Subversion? Why?

TeamCity

We have our own locally hosted TeamCity server to check out our code from GitHub, build it, run tests, create NuGet packages etc. Configuring and maintaining TeamCity is really easy compared to CruiseControl.NET, which I used to use – and with the extremely cool feature branch support of TeamCity 7.1, our build server will automatically check out, build, and test my branch if I name it feature-whatever_I_am_working_on when I push it to GitHub!

NuGet

When you want to build stuff rapidly in .NET, there’s really no way you can ignore NuGet – this is how we get to quickly pull down all of the open source libraries and frameworks that we’re using to build cool stuff.

In addition to the free open source stuff, we also NuGet-pack all of our purchased libraries if they’re not already in a .nupkg. And finally, we pack our own shared libraries as NuGet packages as well (actually, TeamCity does this for us automatically on every successful build…), which is how we distribute shared assemblies like e.g. assemblies with Rebus messages.

NuGet has been really easy and smooth in this regard. I’m sorry that I never tried OpenWrap, though – I’m wondering if anyone is using that, or if everybody has moved to NuGet?

Our own NuGet package repository

In order to reduce the friction with NuGet, we have our own NuGet package repository. NuGet.org is just damn slow, and it doesn’t host all of our 3rd party and home-rolled packages.

At first, we used TeamCity to do this, but the NuGet extension in Visual Studio doesn’t remember your credentials, so you have to constantly punch in your username and password if you’ve enabled authentication. This was just too tedious.

Then we hosted our own stand-alone NuGet Server, which would have been OK if it weren’t for the fact that we’re often working outside of our office, and often on other companies’ VPNs. This doesn’t go well with a privately hosted NuGet server.

But then someone had the crazy idea that we should use DropBox as our NuGet repository! This is as crazy as it is ingenious, because we’re already using DropBox to share documents and stuff, so it didn’t really require anything to just throw all of our NuGet packages into a shared folder… which is what TeamCity also does whenever it has packed up something for us, immediately synchronizing the new package to everyone who’s connected to a network.

This way, we can always get to include our NuGet packages, even when we’re building new stuff on a train with a flaky internet connection. As I said: Crazy, yet ingenious!

Free choice of frameworks and tools

Last, but not least, I think a key to rapidly building stuff is to free developers’ hands so that they can use the tools they like to use best and feel are best for the given task.

It seems many companies are afraid that developers might pollute their applications with questionable libraries and bad decisions, but my feeling is that most developers are totally capable of handling this freedom and will respond with a sense of responsibility and care.

Conclusion

These were a couple of things that I could think of that help us build cool stuff and deliver often. As I may have already insinuated, I’m not sure this way of working would work on large monolithic applications that must live for very long.

But then again, in my experience many systems live for too long because they’ve become too huge, and thus too big and expensive to replace – it is often very healthy for systems to be divided into smaller applications that are well integrated, but each application in itself is fairly small and decoupled and thus doesn’t constitute a huge risk and equally huge endavour to replace.

Unit testing sagas with Rebus

Ok, so now we have taken a look at how we could unit test a simple Rebus message handler, and that looked pretty simple. When unit testing sagas, however, there’s a bit more to it because of the correlation “magic” going on behind the scenes. Therefore, to help unit testing sagas, Rebus comes with a SagaFixture<TSagaData> which you can use to wrap your saga handler while testing.

I guess an example is in order 🙂

Consider this simple saga that is meant to be put in front of a call to an external web service where the actual web service call is being made somewhere else in a service that allows for using simple request/reply with any correlation ID:

Let’s say I want to punish this saga in a unit test. A way to do this is to attack the problem like when unit testing an ordinary message handler – but then a good portion of the logic under test is not tested, namely all of the correlation logic going on behind the scenes. Which is why Rebus has the SagaFixture<TSagaData> thing… check this out:

As you can see, the saga fixture can be used to “host” your saga handler during testing, and it will take care of properly dispatching messages to it, respecting the saga’s correlation setup. Therefore, after handing over the saga to the fixture, you should not touch the saga handler again in that test.

If you want to start out in your test with some premade home-baked saga data, you can pass it to the saga fixture as a second constructor argument like so:

and as you can see, you can access all “live” saga data via fixture.AvailableSagaData, and you can access “dead” saga data as well via fixture.DeletedSagaData.

If you’re interested in logging stuff, failing in case of uncorrelated messages, etc., there’s a couple of events on the saga fixture that you can use: CorrelatedWithExistingSagaData, CreatedNewSagaData, CouldNotCorrelate – so in order to fail in all cases where a message would have been ignored because it could not be correlated, and the message is not allowed to initiate a new saga data instance, you could do something like this:

Nifty, huh?

Unit testing handlers with Rebus

…is pretty easy, because message handlers are ordinary classes free of dependencies – i.e. you implement the appropriate Handle methods, but you don’t have to derive your class off of some base class, and you can have your own dependencies injected, so you’re free to mock everything if you feel like it.

There’s one thing, though: The IBus interface, which you’ll most likely need some time, is kind of clunky to mock with mocking libraries like Moq, Rhino Mocks, NSubstitute, FakeItEasy [insert today’s hip mocking library here] – especially if you’re testing the AAA way of writing your unit tests.

Let’s take a look at a simple handler, whose responsibility is to subscribe to the PurchaseRecorded event and, for each debtor involved in the purchased mortgage deed, ensure that a process is kicked off that subscribes that debtor to an SSN-based address update service provided by the Danish Central Person Registry:

Right, so now I want my test to capture the fact that the incoming event gives rise to multiple messages that the service sends to itself, one for each debtor. My test might look somewhat like this (shown with the classic Rhino Mocks syntax):

It doesn’t require that much imagination to see how the asserts can become completely unreadable if the sent messages contain more than a few fields… which is why Rebus has a FakeBus in the Rebus.Testing namespace!

Check this out:

The example above is quite simple, so it might not be that apparent – but the real force of FakeBus is that just stores all messages that are sent, sent to self, published, replied, etc., also it also stores any headers that may have been added. This way, during testing, you can easily get access to the actually sent messages and inspect whether their information is as expected.