Scheduling recurring tasks in NServiceBus

2010-08-10 by mookid8000 6 Comments

A while ago, on a project I am currently involved with which is based on NServiceBus, we needed to publish certain pieces of information at fixed intervals. I was not totally clear in my head on how this could be implemented in an NServiceBus service, so I asked for help on Twitter, which resulted in a nifty piece of advice from Andreas Öhlund: Set up a timer to do a bus.SendLocal at the specified interval.

That’s exactly what we did, and I think we ended up with a pretty nifty piece of code that I want to show off 🙂

PS: bus.SendLocal(message) effectively does a bus.Send(((UnicastBus)bus).Address, message) – i.e. it puts a message, MSMQ and all, in the service’s own input queue.

First, we have an API that looks like this (looking a little funny, I know – wait and see…):

public interface ISchedule
{
    void Every(TimeSpan interval, Func<IMessage> messageFactoryMethod);
}

public interface ISchedule

{

void Every(TimeSpan interval, Func<IMessage> messageFactoryMethod);

}

– which is implemented like this (registered as a singleton in the container):

public class ServerBasedTimerSchedule : ISchedule, IDisposable
{
    readonly IBus bus;
    readonly List<System.Timers.Timer> timers = new List<System.Timers.Timer>();

    public ServerBasedTimerSchedule(IBus bus)
    {
        this.bus = bus;
    }

    public void Every(TimeSpan interval, Func<IMessage> messageFactoryMethod)
    {
        var timer = new System.Timers.Timer();
        timer.Elapsed += (_, __) => bus.SendLocal(messageFactoryMethod());
        timer.Interval = interval.TotalMilliseconds;
        timer.Start();
        timers.Add(timer);
    }

    public void Dispose()
    {
        timers.ForEach(timer => timer.Dispose());
    }
}

public class ServerBasedTimerSchedule : ISchedule, IDisposable

{

readonly IBus bus;

readonly List<System.Timers.Timer> timers = new List<System.Timers.Timer>();

public ServerBasedTimerSchedule(IBus bus)

{

this.bus = bus;

}

public void Every(TimeSpan interval, Func<IMessage> messageFactoryMethod)

{

var timer = new System.Timers.Timer();

timer.Elapsed += (_, __) => bus.SendLocal(messageFactoryMethod());

timer.Interval = interval.TotalMilliseconds;

timer.Start();

timers.Add(timer);

}

public void Dispose()

{

timers.ForEach(timer => timer.Dispose());

}

The System.Timers.Timer is a timer which uses the thread pool to schedule callbacks at the specified interval. It’s pretty easy to use, and it fits nicely with this scenario.

Now, in combination with this nifty class of extension goodness:

public static class TimeSpanExtensions
{
    public static TimeSpan Seconds(this int seconds)
    {
        return TimeSpan.FromSeconds(seconds);
    }

    public static TimeSpan Minutes(this int minutes)
    {
        return TimeSpan.FromMinutes(minutes);
    }

    // ... etc + for doubles as well
}

public static class TimeSpanExtensions

{

public static TimeSpan Seconds(this int seconds)

{

return TimeSpan.FromSeconds(seconds);

}

public static TimeSpan Minutes(this int minutes)

{

return TimeSpan.FromMinutes(minutes);

}

// ... etc + for doubles as well

}

– we can schedule our tasks like so:

public class ScheduleRealTimeDataPublishing : IWantToRunAtStartup
{
    public ScheduleRealTimeDataPublishing(ISchedule schedule)
    {
        this.schedule = schedule;
    }

    public void Run()
    {
        schedule.Every(5.Seconds(), () => new PublishRealTimeDataMessage());
    }

    public void Stop()
    {
    }
}

public class ScheduleRealTimeDataPublishing : IWantToRunAtStartup

{

public ScheduleRealTimeDataPublishing(ISchedule schedule)

{

this.schedule = schedule;

}

public void Run()

{

schedule.Every(5.Seconds(), () => new PublishRealTimeDataMessage());

}

public void Stop()

{

}

Now, why is this good? It’s good because the actual task will then be carried out by whoever implements IHandleMessages<PublishRealTimeDataMessage> in the service, processing the tasks with all the benefits of the usual NServiceBus message processing pipeline.

Nifty, huh?

Looking over the simplicity and elegance of this solution, I’m kind of embarassed to tell that my first take on this was to implement the timer almost exactly like above, except instead of bus.SendLocal in the Elapsed-callback, we had a huge event handler that simulated most of our message processing pipeline – including NHibernateMessageModule, transactions, and whatnot….

Please note that ScheduleRealTimeDataPublishing is not re-entrant – in this form its Every method should only be used from within the Run and Stop methods of implementors of IWantToRunAtStartup, as these are run sequentially.

Code golf

2010-07-30 by mookid8000 Leave a comment

The result of Kodehoved’s code golf competition has been found, and I got a shared 5th place.

The task was to add two large integers together by representing the integers with arrays of their digits, thus allowing them to become extremely large.

My contribution looks like this (weighing in at 138 characters):

public static int[] Mookid8000_Add(int[] a,int[] b){
   var r="";
   for(int n=a.Length,m=b.Length,s,c=0;n+m+c>0;c=s/10)
      r=(s=(n>0?a[--n]:0)+(m>0?b[--m]:0)+c)%10+r;
   return r.Select(t=>t-48).ToArray();
}

public static int[] Mookid8000_Add(int[] a,int[] b){

var r="";

for(int n=a.Length,m=b.Length,s,c=0;n+m+c>0;c=s/10)

r=(s=(n>0?a[--n]:0)+(m>0?b[--m]:0)+c)%10+r;

return r.Select(t=>t-48).ToArray();

}

Asger‘s and Lars‘ contribution looks like this (135 characters):

public static int[] AsgerHallasOgLarsUdengaard_Add(int[] a,int[] b){
   var r="";
   for(int c=0,x=a.Length,y=b.Length;x+y>0|c>9;)
      r=(c=(x>0?a[--x]:0)+(y>0?b[--y]:0)+c/10)%10+r;
   return r.Select(s=>s-48).ToArray();
}

public static int[] AsgerHallasOgLarsUdengaard_Add(int[] a,int[] b){

var r="";

for(int c=0,x=a.Length,y=b.Length;x+y>0|c>9;)

r=(c=(x>0?a[--x]:0)+(y>0?b[--y]:0)+c/10)%10+r;

return r.Select(s=>s-48).ToArray();

}

And the winners’ (Mads and Peter Sandberg Brun) contribution looks like this (weighing in at an incredibly compact, but almost unreadable, 132 characters :)):

public static int[] MadsOgPeterSandbergBrun_Add(int[] a,int[] b){  
   var c="";  
   for(int o=a.Length,p=b.Length,s=0;-o-p<(s=s/10+(0<o?a[--o]:0)+(0<p?b[--p]:0));)  
      c=s%10+c;  
   return c.Select(i=>i-48).ToArray();  
}

public static int[] MadsOgPeterSandbergBrun_Add(int[] a,int[] b){

var c="";

for(int o=a.Length,p=b.Length,s=0;-o-p<(s=s/10+(0<o?a[--o]:0)+(0<p?b[--p]:0));)

c=s%10+c;

return c.Select(i=>i-48).ToArray();

}

I usually don’t compete in competitions like this, because I’ve always thought of myself as pretty lame when it comes to solving coding puzzles, but this time it was great fun – especially since I was pretty motivated by my desire to beat Asger (my little sister’s boyfriend), who pushed me several iterations further than I would have gone on my own. He ended up beating me though, but I am still pretty satisfied with my fairly compact and almost readable solution.

Fun with NoRM 3

2010-07-22 by mookid8000 Leave a comment

Third post in “Fun With NoRM” will be about how “the dynamism” of JavaScript and JSON is bridged into the rigid and statically typed world of C#. The thing is, in principle there’s no way to be certain that a JSON object returned from MongoDB will actually fit into our static object model.

Consider a situation where, for some reason, some of our orders have a field, PlacedBy, containing the name of the person who placed the order. Let’s see how things will go when adding the field and then querying all orders:

> use dbname
> var order = db.Order.findOne();
> order.PlacedBy = "El Duderino";
> db.Order.save(order);

> use dbname

> var order = db.Order.findOne();

> order.PlacedBy = "El Duderino";

> db.Order.save(order);

var orders = mongo.GetCollection<Order>();

foreach(var order in orders.Find())
{
    Console.WriteLine("Order #{0}", order.Number);
}

var orders = mongo.GetCollection<Order>();

foreach(var order in orders.Find())

{

Console.WriteLine("Order #{0}", order.Number);

}

– and BOOM ! – Cannot deserialize!: Norm.MongoException: Deserialization failed: type MongoTest.Order does not have a property named PlacedBy

This is actually pretty good, because this way we will never accidentally load a document with un-deserializable properties and save it back, thus truncating the document. But how can we handle this?

Well, NoRM makes it pretty easy: Make your model class inherit Expando, thus effectively becoming a dictionary. E.g. like so:

public class Order : Expando
{
   // ...
}

public class Order : Expando

{

// ...

}

Now we can do this:

var orders = mongo.GetCollection<Order>();

foreach(var order in orders.Find())
{
    Console.WriteLine("Order #{0}", order.Number);

    if (order.AllProperties().Any())
    {
        var props = order.AllProperties().Select(p => string.Format("{0}: {1}", p.PropertyName, p.Value));
        Console.WriteLine("\t{0}", string.Join(", ", props.ToArray()));
    }
}

var orders = mongo.GetCollection<Order>();

foreach(var order in orders.Find())

{

Console.WriteLine("Order #{0}", order.Number);

if (order.AllProperties().Any())

{

var props = order.AllProperties().Select(p => string.Format("{0}: {1}", p.PropertyName, p.Value));

Console.WriteLine("\t{0}", string.Join(", ", props.ToArray()));

}

– which yields:

Order #1
    PlacedBy: El Duderino
Order #2
Order #3

Order #1

PlacedBy: El Duderino

Order #2

Order #3

when run with a small DB containing three orders. Nifty, huh?

If you’re sad that you’ve given up your single opportunity to inherit something by deriving from Expando, just go ahead and implement IExpando instead. Then you need to suply a few members, but you can just redirect to an internal Expando in your class.

Next up, a post on how to model inheritance hierarchies… one of my favorites! 🙂

Fun with NoRM 1

2010-07-16 by mookid8000 Leave a comment

My previous posts on MongoDB have been pretty un-.NETty, in that I have focused almost entirely on how to work the DB through its JavaScript API. To remedy that, I shall write a few short posts on how to get rolling with MongoDB using NoRM, the coolest C# driver for MongoDB at the moment.

First post will be on how to connect and shove data into MongoDB.

Short introduction to NoRM

NoRM is “No Object-Relational Mapping”. It’s a .NET-driver, that allows you to map objects and their fields and aggregated objects into documents. I like NoRM because it’s successfully preserved that low-friction MongoDB-feeling, bridging C#’s capabilities nicely to those of JavaScript in the best possible way, providing some extra C#-goodies along the way. Please read on, you’ll see…

Connect to MongoDB

Easy – can be done like so:

using(var mongo = Mongo.Create("mongodb://hostname/dbname"))
{
   // go crazy in here!!1
}

using(var mongo = Mongo.Create("mongodb://hostname/dbname"))

{

// go crazy in here!!1

}

Inserting a few documents

Inserting documents with NoRM is easy – just create a class with fields and aggregated objects, and make sure the class either has a property named something like “Id” or has a property decorated with [MongoIdentifier], e.g. like so:

public class Order
{
    public Order()
    {
        Id = ObjectId.NewObjectID();
        Items = new List<Item>();
    }

    public ObjectId Id { get; set; }
    public int Number { get; set; }
    public List<Item> Items { get; set; }
}

public class Item
{
    public string Name { get; set; }
    public int Amount { get; set; }
}

public class Order

{

public Order()

{

Id = ObjectId.NewObjectID();

Items = new List<Item>();

}

public ObjectId Id { get; set; }

public int Number { get; set; }

public List<Item> Items { get; set; }

}

public class Item

{

public string Name { get; set; }

public int Amount { get; set; }

}

– and then go ahead and pull a strongly typed collection and insert documents into it:

var orders = mongo.GetCollection<Order>();

orders.Insert(new Order {
    Number = 1,
    Items = {
        new Item { Name = "beer" },
        new Item { Name = "nuts" },
    }
});

var orders = mongo.GetCollection<Order>();

orders.Insert(new Order {

Number = 1,

Items = {

new Item { Name = "beer" },

new Item { Name = "nuts" },

}

});

Now, to make this work I need to create five 200-line XML-files with mapping info etc. </kidding> no, seriously – that’s all it takes to persist an entire aggregate root!!

Pretty cool, eh? That’ s what I meant when I said low friction. Stay tuned for more posts, e.g. on how to query a collection…

IoC component registration

2010-05-07 by mookid8000 4 Comments

An interesting topic, that I always love to discuss, is how to find a balance between building a pure domain model and being pragmatic and getting the job done. It just so happens, that getting the job done, and being able to add features to a system quickly, usually conflicts with my inner purist, who really wishes to keep my domain model and services oblivious of everything infrastructure-related.

Usually, I go for pragmatism. Not only because I’m lazy, but also because I think it’s funny to come up with solutions that accelerate development, and generally make the sun shine brighter.

Actually, this post should have been #2 in a series called “Polluting My Domain”, and this post should have been #1, because in #1 I showed how I usually use attributes to give hints to the automapper in Fluent NHibernate – e.g. [Cascade] on a relation to configure NHibernate to cascade operations across that relation, or [Indexed("ix__something")] to make that column be indexed in the database, or [Encrypted] to make that particular property be backed by an encrypting IUserType.

This post, however, will show a pragmatic, elegant and flexible way to make component registration easy in an IoC container.

Most IoC containers that I know of can be configured with XML and through some kind of more or less fluent API in code. I’ll spare you the XML, so I’ll just show a small example on Castle Windsor’s fluent API:

container.Register(Component.For<ISomeService>().ImplementedBy<SomeServiceImplementation>())

1	container.Register(Component.For<ISomeService>().ImplementedBy<SomeServiceImplementation>())

or you can perform multiple registrations like this:

container.Register(AllTypes.FromAssemblyContaining<SomeService>()
                                .BasedOn<IService>()
                                .WithService.FromInterface())

container.Register(AllTypes.FromAssemblyContaining<SomeService>()

.BasedOn<IService>()

.WithService.FromInterface())

This is all fine and dandy, but I think the fluent API becomes pretty complicated when you throw customized registrations per customer and/or environment into the mix – mostly because it will become kind of obscure which services get registered where.

Therefore, one of the first things I have put in my recent projects, have been a registration routine based on attributes. Pretty simple, and yes, it does pollute my domain services with infrastructure-related stuff, but this is a great example where I prefer pragmatism and simplicity over purity.

My most recent project has two attributes, ServiceAttribute and RegisterInAttribute that look something like this:

[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public class ServiceAttribute
{
    readonly Type serviceType;

    public ServiceAttribute()
    {
    }

    public ServiceAttribute(Type serviceType)
    {
        if (serviceType == null) throw new ArgumentNullException("serviceType");
        this.serviceType = serviceType;
    }

    public Type ServiceType
    {
        get { return serviceType; }
    }
}

public enum Environment
{
    Development,
    Test,
    Staging,
    Production,
}

[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]
public class RegisterInAttribute
{
    readonly Environment environment;

    public RegisterInAttribute(Environment environment)
    {
        this.environment = environment;
    }

    public Environment Environment
    {
        get { return environment; }
    }
}

[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]

public class ServiceAttribute

{

readonly Type serviceType;

public ServiceAttribute()

{

}

public ServiceAttribute(Type serviceType)

{

if (serviceType == null) throw new ArgumentNullException("serviceType");

this.serviceType = serviceType;

}

public Type ServiceType

{

get { return serviceType; }

}

public enum Environment

{

Development,

Test,

Staging,

Production,

}

[AttributeUsage(AttributeTargets.Class, AllowMultiple = true)]

public class RegisterInAttribute

{

readonly Environment environment;

public RegisterInAttribute(Environment environment)

{

this.environment = environment;

}

public Environment Environment

{

get { return environment; }

}

and then the registration code looks like this:

public class ComponentRegistrar
{
	public static void RegisterComponentsFromAssemblyOf<TSomeType>(IWindsorContainer container, 
                                                                       Environment environment)
	{
		var components = typeof(TSomeType).Assembly.GetTypes()
					.Where(t => ShouldBeRegistered(t, environment))
					.SelectMany(t => ToComponentRegistrations(t));
		
		container.Register(components);
	}
	
	static IEnumerable<TAttribute> GetAttributes(ICustomAttributeProvider provider)
	{
		return provider.GetCustomAttributes(typeof(TAttribute), false).Cast<TAttribute>();
	}
	
	static bool ShouldBeRegistered(ICustomAttributeProvider provider, Environment environment)
	{
		var attributes = GetAttributes<RegisterInAttribute>();
		
		return !attributes.Any() || attributes.Any(a => a.Environment == environment)
	}
	
	static IEnumerable<IRegistration> ToComponentRegistrations(Type type)
	{
                var serviceType = a.ServiceType ?? type;

		return GetAttributes<ServiceAttribute>(type)
				.Select(a => Component.For(serviceType)
							.ImplementedBy(type)
							.Lifestyle.Transient);
	}
}

public class ComponentRegistrar

{

public static void RegisterComponentsFromAssemblyOf<TSomeType>(IWindsorContainer container,

Environment environment)

{

var components = typeof(TSomeType).Assembly.GetTypes()

.Where(t => ShouldBeRegistered(t, environment))

.SelectMany(t => ToComponentRegistrations(t));

container.Register(components);

}

static IEnumerable<TAttribute> GetAttributes(ICustomAttributeProvider provider)

{

return provider.GetCustomAttributes(typeof(TAttribute), false).Cast<TAttribute>();

}

static bool ShouldBeRegistered(ICustomAttributeProvider provider, Environment environment)

{

var attributes = GetAttributes<RegisterInAttribute>();

return !attributes.Any() || attributes.Any(a => a.Environment == environment)

}

static IEnumerable<IRegistration> ToComponentRegistrations(Type type)

{

var serviceType = a.ServiceType ?? type;

return GetAttributes<ServiceAttribute>(type)

.Select(a => Component.For(serviceType)

.ImplementedBy(type)

.Lifestyle.Transient);

}

So, having established the current value of environment, my component registration will look like this:

var environment = DetermineEnvironmentFromAppSettingsOrSomethingLikeThat();

var container = new WindsorContainer();
ComponentRegistrar.RegisterComponentsFromAssemblyOf<HomeController>(container, environment);
ComponentRegistrar.RegisterComponentsFromAssemblyOf<SomeDomainService>(container, environment);
ComponentRegistrar.RegisterComponentsFromAssemblyOf<SomeInfrastructureService>(container, environment);

// yay!

var environment = DetermineEnvironmentFromAppSettingsOrSomethingLikeThat();

var container = new WindsorContainer();

ComponentRegistrar.RegisterComponentsFromAssemblyOf<HomeController>(container, environment);

ComponentRegistrar.RegisterComponentsFromAssemblyOf<SomeDomainService>(container, environment);

ComponentRegistrar.RegisterComponentsFromAssemblyOf<SomeInfrastructureService>(container, environment);

// yay!

– and that’s it! But the best of it is that adding services to the system now becomes a breeze – check this out – registering a concrete type, offering itself as a service:

[Service]
public class HomeController : Controller
{
    // (....)
}

[Service]

public class HomeController : Controller

{

// (....)

}

– and here’s registering different stuff depending on the environment:

[Service]
[RegisterIn(Environment.Development)]
public class DebugController : Controller
{
    // (....)
}

[Service(typeof(IMailSender))]
[RegisterIn(Environment.Production)]
public class SmtpMailSender : IMailSender
{
    // (...)
}

[Service(typeof(IMailSender))]
[RegisterIn(Environment.Staging)]
public class FakeSmtpMailSender : IMailSender
{
    // (...)
}

[Service(typeof(IMailSender))]
[RegisterIn(Environment.Development)]
[RegisterIn(Environment.Test)]
public class LoggingMailSender : IMailSender
{
    // (...)
}

[Service]

[RegisterIn(Environment.Development)]

public class DebugController : Controller

{

// (....)

}

[Service(typeof(IMailSender))]

[RegisterIn(Environment.Production)]

public class SmtpMailSender : IMailSender

{

// (...)

}

[Service(typeof(IMailSender))]

[RegisterIn(Environment.Staging)]

public class FakeSmtpMailSender : IMailSender

{

// (...)

}

[Service(typeof(IMailSender))]

[RegisterIn(Environment.Development)]

[RegisterIn(Environment.Test)]

public class LoggingMailSender : IMailSender

{

// (...)

}

This way of registering component has proven to me several times to be a simple and nifty way of managing the differences between environments, and even differences between customers (which would require a few extensions to the example above though), still being able to add services to the system quickly.

Another benefit is that it’s pretty clear what happens, even to developers who might not be that experienced in using IoC containers. If I were the only developer on a project, I would probably prefer component registration based on conventions, but when you have a team, you sometimes need to make some things more explicit.

Even more checking out MongoDB: The coolness continues

2010-03-31 by mookid8000 Leave a comment

One thing I started to think about after having looked at MongoDB was how to model things that are somehow connected – without the use of foreign keys and the ability to join stuff.

When working with documents, you generally just embed the data where it belongs. But what if I have the following documents:

"songs" collection:
{
    "name": "Factory",
    "artist": "Martha Wainwright"
}

{
    "name": "Winter At The Hamptons",
    "artist": "Josh Rouse"
}

"users" collection:
{
    "username": "mookid",
    "name": "Mogens Heller Grabe"
}

{
    "username": "duderino",
    "name": "The Dude"
}

"songs" collection:

{

"name": "Factory",

"artist": "Martha Wainwright"

}

{

"name": "Winter At The Hamptons",

"artist": "Josh Rouse"

}

"users" collection:

{

"username": "mookid",

"name": "Mogens Heller Grabe"

}

{

"username": "duderino",

"name": "The Dude"

}

– and I want to constrain access to the songs, allowing me to see both songs, and The Dude to see only Factory?

My first take was to simply add username in an array inside each song, like so:

"songs" collection:
{
    "name": "Factory",
    "artist": "Martha Wainwright",
    "allowed": ["mookid", "duderino"]
}

{
    "name": "Winter At The Hamptons",
    "artist": "Josh Rouse",
    "allowed": ["mookid"]
}

"songs" collection:

{

"name": "Factory",

"artist": "Martha Wainwright",

"allowed": ["mookid", "duderino"]

}

{

"name": "Winter At The Hamptons",

"artist": "Josh Rouse",

"allowed": ["mookid"]

}

– and this will work well with indexing, which can be done like this:

db.songs.ensureIndex({"allowed": 1})
db.songs.find({"allowed": "mookid"})  // will use the index :)

1 2	db.songs.ensureIndex({"allowed": 1}) db.songs.find({"allowed": "mookid"}) // will use the index :)

But here comes the problem: What if each song should be displayed along with the name of who can access that particular song? I need to embed more stuff in the array, e.g. like so:

"songs" collection:
{
    "name": "Factory",
    "artist": "Martha Wainwright",
    "allowed": [{
        "username": "mookid", 
        "name": "Mogens Heller Grabe"
    }, {
        "username": "duderino", 
        "name": "The Dude"
    }]
}

{
    "name": "Winter At The Hamptons",
    "artist": "Josh Rouse",
    "allowed": [{
        "username": "mookid", 
        "name": "Mogens Heller Grabe"
    }]
}

"songs" collection:

{

"name": "Factory",

"artist": "Martha Wainwright",

"allowed": [{

"username": "mookid",

"name": "Mogens Heller Grabe"

}, {

"username": "duderino",

"name": "The Dude"

}]

}

{

"name": "Winter At The Hamptons",

"artist": "Josh Rouse",

"allowed": [{

"username": "mookid",

"name": "Mogens Heller Grabe"

}]

}

There’s a challenge now in keeping this extra “denormalized” piece of information up-to-date in case a user changes his name, etc. – but let’s just assume we’ve handled that.

Now here comes the cool thing: It’s cool that MongoDB can index “into” an array, but it can actually index “into” anything! Just tell it where to go, using the Dot Notation.

That means I can perform the same search as I did above like so:

db.songs.ensureIndex({"allowed.username": 1})
db.songs.find({"allowed.username": "mookid"})  // will use the index :)

1 2	db.songs.ensureIndex({"allowed.username": 1}) db.songs.find({"allowed.username": "mookid"}) // will use the index :)

How cool is that?? (pretty cool, actually!)

More checking out MongoDB: Querying

2010-03-07 by mookid8000 Leave a comment

In my first post about MongoDB, I touched querying very lightly. Querying is of course pretty important to most systems, so it’s fair to dedicate a separate post to the subject.

Querying in MongoDB works by sending a document to the server, e.g. in the following snippet I create a document with a post ID

var somePost = db.posts.findOne({"_id": ObjectId("00112233445566778899aabb")})

1	var somePost = db.posts.findOne({"_id": ObjectId("00112233445566778899aabb")})

– which can actually be even shorter, as the find and findOne functions accept an ObjectId directly as their argument, like so:

var somePost = db.posts.findOne(ObjectId("00112233445566778899aabb"))

1	var somePost = db.posts.findOne(ObjectId("00112233445566778899aabb"))

But how can I find a post with a specific slug? Easy, like so:

var somePost = db.posts.find({slug: "this-slug-probably-comes-from-a-url"})

1	var somePost = db.posts.find({slug: "this-slug-probably-comes-from-a-url"})

But how does this perform? It’s easy to examine how queries are executed with the explain() function, like so:

> db.posts.find({slug: "this-slug-probably-comes-from-a-url"}).explain()
{
        "cursor" : "BasicCursor",
        "startKey" : {

        },
        "endKey" : {

        },
        "nscanned" : 10000,
        "n" : 1,
        "millis" : 11,
        "allPlans" : [
                {
                        "cursor" : "BasicCursor",
                        "startKey" : {

                        },
                        "endKey" : {

                        }
                }
        ]
}

> db.posts.find({slug: "this-slug-probably-comes-from-a-url"}).explain()

{

"cursor" : "BasicCursor",

"startKey" : {

"endKey" : {

"nscanned" : 10000,

"n" : 1,

"millis" : 11,

"allPlans" : [

{

"cursor" : "BasicCursor",

"startKey" : {

"endKey" : {

}

]

}

– yielding some info about the execution of the query. I don’t know exactly how to interpret all this, but I think I get that "nscanned": 10000 means 10000 documents were scanned – and in a collection with 10000 documents, that’s not entirely optimal as it implies a full table scan. Now, let’s make sure that our query will execute as fast as possible by creating an index on the field ( _id is always automatically indexed):

db.posts.ensureIndex({slug:1})

1	db.posts.ensureIndex({slug:1})

Now lets explain() again:

> db.posts.find({slug: "post-no-454"}).explain()
{
        "cursor" : "BtreeCursor slug_1",
        "startKey" : {
                "slug" : "post-no-454"
        },
        "endKey" : {
                "slug" : "post-no-454"
        },
        "nscanned" : 1,
        "n" : 1,
        "millis" : 0,
        "allPlans" : [
                {
                        "cursor" : "BtreeCursor slug_1",
                        "startKey" : {
                                "slug" : "post-no-454"
                        },
                        "endKey" : {
                                "slug" : "post-no-454"
                        }
                }
        ]
}

> db.posts.find({slug: "post-no-454"}).explain()

{

"cursor" : "BtreeCursor slug_1",

"startKey" : {

"slug" : "post-no-454"

"endKey" : {

"slug" : "post-no-454"

"nscanned" : 1,

"n" : 1,

"millis" : 0,

"allPlans" : [

{

"cursor" : "BtreeCursor slug_1",

"startKey" : {

"slug" : "post-no-454"

"endKey" : {

"slug" : "post-no-454"

}

]

}

Wow! That’s what I call an improvement!

What about posts with a specific tag? First I tried the following snippet, because I learned that the special $where field could be put in a query document to evaluate a predicate server-side:

var niftyPosts = db.posts.find({$where: function() { return this.tags != null && this.tags.indexOf("nifty") != -1; }})

1	var niftyPosts = db.posts.find({$where: function() { return this.tags != null && this.tags.indexOf("nifty") != -1; }})

– and this actually works. This syntax is sort of clunky though. Luckily, MongoDB provides a nifty mechanism for arrays that automagically checks if something is contained in it. So my query can be rewritten to this:

var niftyPosts = db.posts.find({tags: 'nifty'})

1	var niftyPosts = db.posts.find({tags: 'nifty'})

Nifty!

Now, to take advantage of indexes, the special query document fields should be used. I showed $where above, but there are more – to name a few: $gt (greater than), $gte (greater than or equal), $lt (less than), $lte (less than or equal), $ne (not equal), and many more. For example, to count the number of non-nifty posts in February 2010:

> db.posts.find({
    date: {$gte: 20100201, $lt: 20100301},
    tags: {$ne: 'nifty'}
}).count()
0

> db.posts.find({

date: {$gte: 20100201, $lt: 20100301},

tags: {$ne: 'nifty'}

}).count()

Check out the manual for some great documentation on available operators.

Conclusion

Querying with MongoDB actually seems pretty cool and flexible. I like the idea that it’s possible to execute ad-hoc queries, and for most usage I think the supplied operators are adequate. The ability to supply a predicate function via $where seems really cool, but it should probably only be used in conjunction with one or more of the other operators to avoid a full table scan.

More checking out MongoDB: References

2010-03-05 by mookid8000 1 Comment

This post will touch a little bit on the mechanism used for references, and then a few thoughts on how document-orientation relates to OO.

Now – if you, like me, are into OO and normalized object models – the weirdness begins….. or maybe not?! (actually, I am not sure yet :))

In an OO world (and in a normalized RDB world as well), you reference stuff, thus reducing the amount of redundant information as much as possible. E.g. the names of countries should not be put in a column in your address table, each country should have a row in the countries table, and then be referenced by a countryId in the address table.

In a document-oriented world, you generally embed objects instead of referencing them. This is done for performance reasons, and because there’s no way to join stuff – which means that a stored ID/foreign key merely remains free for the client to manually use in additional queries.

When you do need to actually reference another document, use DBRef to create a reference, supplying the collection and the ID as arguments. E.g. like so:

> var author = {name: 'joe'}
> db.authors.save(author)
> author
{ "name" : "joe", "_id" : ObjectId("4b8ed200384e0000000065d8") }
> var post = {headline: 'hello there', author: new DBRef('authors', author._id)}
> db.posts.save(post)
> post
{
        "headline" : "hello there",
        "author" : {
                "$ref" : "authors",
                "$id" : ObjectId("4b8ed200384e0000000065d8")
        },
        "_id" : ObjectId("4b8ed4c8384e0000000065d9")
}

> var author = {name: 'joe'}

> db.authors.save(author)

> author

{ "name" : "joe", "_id" : ObjectId("4b8ed200384e0000000065d8") }

> var post = {headline: 'hello there', author: new DBRef('authors', author._id)}

> db.posts.save(post)

> post

{

"headline" : "hello there",

"author" : {

"$ref" : "authors",

"$id" : ObjectId("4b8ed200384e0000000065d8")

"_id" : ObjectId("4b8ed4c8384e0000000065d9")

}

This way, the reference is represented in a consistent manner which may – or may not – be picked up by the driver you are using. The C# driver can create DBRefs and follow them, but you don’t get to join stuff – you still need an extra query.

Embedding objects may seem a little clunky at first, but actually this plays nicely with some common OO concepts – take aggregation, for example: a blog post has an array of comments, each of which makes no sense without an aggregating post – i.e. comments live and die with their post. That’s an obvious sign that comments should be embedded in the post. So, instead of:

posts:
{'_id': ObjectId('11223344556677889900aabb'), headline: 'A post'}

comments:
{'_id': ObjectId('bbaa00112233554477669988'), text: 'hello there', post: { "$ref": "posts", "$id": ObjectId('11223344556677889900aabb') }}
{'_id': ObjectId('881122335544776699aa00bb'), text: 'hello again', post: { "$ref": "posts", "$id": ObjectId('11223344556677889900aabb') }}

posts:

{'_id': ObjectId('11223344556677889900aabb'), headline: 'A post'}

comments:

{'_id': ObjectId('bbaa00112233554477669988'), text: 'hello there', post: { "$ref": "posts", "$id": ObjectId('11223344556677889900aabb') }}

{'_id': ObjectId('881122335544776699aa00bb'), text: 'hello again', post: { "$ref": "posts", "$id": ObjectId('11223344556677889900aabb') }}

– you should do this:

{
    '_id': ObjectId('11223344556677889900aabb'), 
    headline: 'A post',
    comments: [
        {text: 'hello there'},
        {text: 'hello again'}
    ]
}

{

'_id': ObjectId('11223344556677889900aabb'),

headline: 'A post',

comments: [

{text: 'hello there'},

{text: 'hello again'}

]

}

Actually this makes me think about the concept of an aggregate root in DDD: an aggregate root “owns” the data beneath it, and is responsible for maintaining its own integrity. If you were to delete an aggregate root, all the data beneath it would dissappear.

This also fits kind of nicely with the fact that there’s no database transactions in MongoDB – i.e. there’s no way to issue multiple statements and have them rolled back in case of an error – there’s only documents, and either a document gets inserted/updated/deleted, or it doesn’t. So obviously, the document is the unit of atomicity, which fits (sort of nicely) with the aggregate root and its responsibility of keeping itself internally consistent.

Conclusion

The observations stated here pretty much make a document an aggregate root in the DDD sense – especially since only documents get an _id. There’s no obvious way to reference a particular comment inside the second post shown above.

If MongoDB’s performance is up to the task, data should probably be aggregated as much as possible into large documents. MongoDB’s limit is 4 MB per document, but I am unsure of how large documents should be before you should consider splitting them.

Maybe I am thinking too much about these things? Maybe I should just try and build something and see where my document modeling goes? Suggestions and comments are welcome 🙂

Checking out MongoDB

2010-03-03 by mookid8000 4 Comments

Having experienced a lot of pain using RDBMSs ([1. Usually because of abusing RDBMSs, actually. Storing an object model in a RDBMS is not painful as long as the tooling is right – e.g. by leveraging the amazing NHibernate. The pain comes when developers suddenly start implementing overly complex queries and doing reporting on top of a pretty entity model, modeling stuff OO style… ouch!]) as a default choice of persistence, having read a couple of blog posts about MongoDB, and being generally interested in widening my horizon, I decided to check out MongoDB.

This post is a write-as-I-go summary of the information I have gathered from the following places:

Getting MongoDB

Piece of cake! Download MongoDB from the download center and shove the binaries away somewhere on your machine. Default is for MongoDB to store its data in /data/db which translates to c:\data\db if you are using Windows – go ahead and create this directory. The MongoDB daemon can be started by running mongod.exe, which will accept connections on localhost:27017.

It will probably look something like the screenshot shown below.

An alternative data path can be specified on the command line, e.g. like so: mongod --dbpath c:\somewhere\else.

Accessing it with JavaScript

Run mongo.exe to start the Mongo Shell. It will probably look something like this:

In the MongoDB prompt, you can use JavaScript to access the db. Here’s a sample session of some commands I have found useful:

// lists the dbs in this mongo
> show dbs   
admin
local
>
> // change database context to some db ("myblog" - will automagically be created)
> use myblog   
switched to db myblog
>
> show collections
system.indexes
>
> // save a couple of documents in a collection named "posts"
> // (collection will be automagically created as well...)
> db.posts.save({
    headline: 'Notes to self about MongoDB', 
    slug: 'notes-to-self-about-mongodb', 
    tags: ['mongodb', 'nosql', 'nifty', 'c#']
 })
> db.posts.save({
    headline: 'Someday I want to check out CouchDB as well', 
    slug: 'someday-i-want-to-check-out-couchdb-as-well', 
    tags: ['couchdb', 'nosql', 'nifty', 'c#']
 })
>
> // show documents in a collection (returns a cursor, which will be iterated for the first 10 or 20 results - next pages can be retrieved with the 'it' command)
> db.posts.find()
{ "_id" : ObjectId("4b8e4281781b000000005cfc"), "headline" : "Notes to self about MongoDB", "slug" : "notes-to-self-about-mongodb", "tags" : [ "mongodb", "nosql", "nifty", "c#" ] }
{ "_id" : ObjectId("4b8e42cc781b000000005cfd"), "headline" : "Someday I want to check out CouchDB as well", "slug" : "someday-i-want-to-check-out-couchdb-as-well", "tags" : [ "couchdb", "nosql", "nifty", "c#" ] }

// lists the dbs in this mongo

> show dbs

admin

local

> // change database context to some db ("myblog" - will automagically be created)

> use myblog

switched to db myblog

> show collections

system.indexes

> // save a couple of documents in a collection named "posts"

> // (collection will be automagically created as well...)

> db.posts.save({

headline: 'Notes to self about MongoDB',

slug: 'notes-to-self-about-mongodb',

tags: ['mongodb', 'nosql', 'nifty', 'c#']

})

> db.posts.save({

headline: 'Someday I want to check out CouchDB as well',

slug: 'someday-i-want-to-check-out-couchdb-as-well',

tags: ['couchdb', 'nosql', 'nifty', 'c#']

})

> // show documents in a collection (returns a cursor, which will be iterated for the first 10 or 20 results - next pages can be retrieved with the 'it' command)

> db.posts.find()

{ "_id" : ObjectId("4b8e4281781b000000005cfc"), "headline" : "Notes to self about MongoDB", "slug" : "notes-to-self-about-mongodb", "tags" : [ "mongodb", "nosql", "nifty", "c#" ] }

{ "_id" : ObjectId("4b8e42cc781b000000005cfd"), "headline" : "Someday I want to check out CouchDB as well", "slug" : "someday-i-want-to-check-out-couchdb-as-well", "tags" : [ "couchdb", "nosql", "nifty", "c#" ] }

Now, I have successfully added two documents representing blog posts in a collection named posts. As you can see, MongoDB assigns some funky IDs to the documents.

> // lets get the first post ('find' and 'findOne' accept a query document as their first parameter)
> db.posts.findOne({'_id': ObjectId('4b8e4281781b000000005cfc')})
{
        "_id" : ObjectId("4b8e4281781b000000005cfc"),
        "headline" : "Notes to self about MongoDB",
        "slug" : "notes-to-self-about-mongodb",
        "tags" : [
                "mongodb",
                "nosql",
                "nifty",
                "c#"
        ]
}
>
> // now let's find all IDs ('find' and 'findOne' accept as their second parameter a document
> // specifying which fields to return)
> db.posts.find({}, {'_id': true})
{ "_id" : ObjectId("4b8e4281781b000000005cfc") }
{ "_id" : ObjectId("4b8e42cc781b000000005cfd") }
{ "_id" : ObjectId("4b8e4595781b000000005cfe") }
>

> // lets get the first post ('find' and 'findOne' accept a query document as their first parameter)

> db.posts.findOne({'_id': ObjectId('4b8e4281781b000000005cfc')})

{

"_id" : ObjectId("4b8e4281781b000000005cfc"),

"headline" : "Notes to self about MongoDB",

"slug" : "notes-to-self-about-mongodb",

"tags" : [

"mongodb",

"nosql",

"nifty",

"c#"

]

}

> // now let's find all IDs ('find' and 'findOne' accept as their second parameter a document

> // specifying which fields to return)

> db.posts.find({}, {'_id': true})

{ "_id" : ObjectId("4b8e4281781b000000005cfc") }

{ "_id" : ObjectId("4b8e42cc781b000000005cfd") }

{ "_id" : ObjectId("4b8e4595781b000000005cfe") }

That was a brief demonstration of the JavaScript API in the Mongo Shell. Now, let’s do this from C#.

Getting started with mongodb-csharp

Now, go to mongodb-csharp dowload section at GitHub and get a debug build of the driver. Create a C# project and reference the MongoDB.Driver assembly.

On my machine, punching in the following actually works:

[Test]
public void CanAddPost()
{
  using(var mongo = new Mongo())
  {
    mongo.Connect();

    var db = mongo["myblog"];
    var posts = db["posts"];

    posts.Insert(new Document
                   {
                     {"headline", "Post added from C#"},
                     {"slug", "post-added-from-csharp"},
                     {
                       "tags", new[]
                                 {
                                   "c#",
                                   "nifty"
                                 }
                       }
                   });
  }
}

[Test]

public void CanAddPost()

{

using(var mongo = new Mongo())

{

mongo.Connect();

var db = mongo["myblog"];

var posts = db["posts"];

posts.Insert(new Document

{

{"headline", "Post added from C#"},

{"slug", "post-added-from-csharp"},

{

"tags", new[]

{

"c#",

"nifty"

}

});

}

Now, I can verify that the document is actually in there by going back to the console and doing this:

> db.posts.find()
{ "_id" : ObjectId("4b8e4281781b000000005cfc"), "headline" : "Notes to self about MongoDB", "slug" : "notes-to-self-about-mongodb", "tags" : [ "mongodb", "nosql", "nifty", "c#" ] }
{ "_id" : ObjectId("4b8e42cc781b000000005cfd"), "headline" : "Someday I want to check out CouchDB as well", "slug" : "someday-i-want-to-check-out-couchdb-as-well", "tags" : [ "couchdb", "nosql", "nifty", "c#" ] }
{ "_id" : ObjectId("4b8e4b72091abb14e4000001"), "headline" : "Post added from C#", "slug" : "post-added-from-csharp", "tags" : [ "c#", "nifty" ] }
>

> db.posts.find()

{ "_id" : ObjectId("4b8e4281781b000000005cfc"), "headline" : "Notes to self about MongoDB", "slug" : "notes-to-self-about-mongodb", "tags" : [ "mongodb", "nosql", "nifty", "c#" ] }

{ "_id" : ObjectId("4b8e4b72091abb14e4000001"), "headline" : "Post added from C#", "slug" : "post-added-from-csharp", "tags" : [ "c#", "nifty" ] }

Nifty! Now lets show the posts from C#. On my machine the following snippet displays the headlines of all posts:

[Test]
public void CanShowPosts()
{
  using(var mongo = new Mongo())
  {
    mongo.Connect();

    var db = mongo["myblog"];
    var posts = db["posts"];

    Console.WriteLine("Posts");
    foreach(var post in posts.FindAll().Documents)
    {
      Console.WriteLine("    {0}", post["headline"]);
    }
  }
}

[Test]

public void CanShowPosts()

{

using(var mongo = new Mongo())

{

mongo.Connect();

var db = mongo["myblog"];

var posts = db["posts"];

Console.WriteLine("Posts");

foreach(var post in posts.FindAll().Documents)

{

Console.WriteLine(" {0}", post["headline"]);

}

– which is documented in the following screenshot:

Random nuggets of information

Document IDs

All MongoDB documents must have an ID in the _id field, either assigned by you (any object can be used), or automatically by MongoDB. IDs generated by MongoDB are virtually globally unique, as they consist of the following: 4 bytes of timestamp, 3 bytes of machine identification, 2 bytes of process identification, 3 bytes of something that gets incremented.

As a nifty consequence, the time of creation can be extracted from auto-generated IDs.

The ID type used by MongoDB can be created with ObjectId('00112233445566778899aabb') (where the input must be a string representing 12 bytes in HEX).

How are documents stored?

I you have not yet figured it out, documents are serialized to JSON – with the minor modification that it’s a BINARY version of JSON, hence it’s called BSON.

String encoding

UTF-8. No worries.

What about references?

I will research this and do a separate post on the subject. As MongoDB is non-relational, a “join” is – in principle – an unknown concept. There’s a mechanism, however, that allows for consistent representation of foreign keys that may/may not give you some extra functionality (depending on the driver you are using).

What about querying?

I will research this as well, posting as I go.

OR/M? (or OD/M?)

It is not yet clear to me how to handle Object-Document Mapping. Will require some research as well. As an OO dude, I am especially interested in finding out what a schema-less persistance mechanism will do to my design.

What else?

More topics include applying indices, deleting/updating, atomicity, and more. Implies additional blog posts.

Conclusion

My first impression of MongoDB is really good. It’s extremely easy to get going, and the few error messages I have received were easy to understand.

I am especially in awe with how little friction I encountered – mostly because of the schema-less nature, but also because everything just worked right away.

NHibernate is very flexible

2010-01-29 by mookid8000 2 Comments

…but it does impose limitations on your domain model.

Most of these limitations, however, like the need for public/ internal/ protected members to be virtual, and the requirement for a default constructor to exist with at least protected accessibility, are not that hard to adhere to and usually don’t interfere with what you would do if there were no rules at all.

One of the limitations, however, can be pretty significant – Ayende describes the problem here, using the term “ghost objects”.

But, as I am about to show, this significance only arises if you follow a certain style of coding, which you should usually avoid!

Short explanation of the problem

When NHibernate lazy-loads an entity from the db (i.e. when you call session.Load<TEntity>(id) or when an entity in your session references something through a lazy-loaded association), it does so by providing an instance of a runtime-generated type, which acts as a proxy.

The first time you access something on the proxy, it gets “hydrated”, which is just a fancy way of saying that the data will be loaded from the database.

This would be fine and dandy, if it weren’t for the fact that the proxy is a runtime-generated subclass of your entity, which – in cases where inheritance is involved – will be a sibling to the other derived classes. Consider the simple inheritance hierarchy on the sketch to the right which in code could be something like so:

public abstract class LegalEntity
{
    public virtual Guid Id { get; set; }
}

public class Person : LegalEntity
{
    public virtual string FirstNames { get; set; }
    public virtual string LastName { get; set; }
}

public class Company : LegalEntity
{
    public virtual string CompanyName { get; set; }
}

public abstract class LegalEntity

{

public virtual Guid Id { get; set; }

}

public class Person : LegalEntity

{

public virtual string FirstNames { get; set; }

public virtual string LastName { get; set; }

}

public class Company : LegalEntity

{

public virtual string CompanyName { get; set; }

}

– and then NHibernate will generate something along the lines of this (fake :)) class signature:

public class LegalEntityProxy1234AndSomeMoreStuff : LegalEntity
{
   // ... secret stuff to access db in here
}

public class LegalEntityProxy1234AndSomeMoreStuff : LegalEntity

{

// ... secret stuff to access db in here

}

See the problem? Here’s the problem:

var legalEntity = session.Load<LegalEntity>(someKnownId);
Assert.IsTrue(legalEntity is Person
              || legalEntity is Company);  //< AssertionException! will never be Person or Company

var legalEntity = session.Load<LegalEntity>(someKnownId);

Assert.IsTrue(legalEntity is Person

|| legalEntity is Company); //< AssertionException! will never be Person or Company

This means that this kind of runtime type checking will fail in those circumstances where the entity is a lazy-loaded reference of the supertype, and the following will FAIL:

var legalEntity = session.Load<LegalEntity>(someKnownId);
if (legalEntity is Person)
{
    var person = (Person) legalEntity;
    return person.FirstNames + " " + person.LastName;
}
else
{
    // we know it's a company then, right? WRONG!
    var company = (Company) legalEntity;  //< InvalidCastException!
    return company.CompanyName;
}

var legalEntity = session.Load<LegalEntity>(someKnownId);

if (legalEntity is Person)

{

var person = (Person) legalEntity;

return person.FirstNames + " " + person.LastName;

}

else

{

// we know it's a company then, right? WRONG!

var company = (Company) legalEntity; //< InvalidCastException!

return company.CompanyName;

}

One possible solution

The other day, Ayende blogged about a recent addition to NHibernate, namely lazy-loaded properties. This allows an entity to be partially hydrated, intercepting calls to certain properties to lazy-load the relevant fields on demand.

This feature is great when storing LOBs alongside the other fields on an entity, but it also laid the ground for his most recent addition, which is the ability to lazy-load an association by setting lazy="no-proxy" on it.

This way, NHibernate will not build a proxy, but instead it will intercept the property getter and load the entity at that point in time, thus being able to return the exact (sub)type of the loaded entity.

Now this seems to solve our problems, but let’s zoom out a bit … why did we have a problem in the first place? Our problem was actually that we failed to write object-oriented code, but instead we wrote a brittle piece of code that would fail at runtime whenever someone added a new subtype, thus violating the Liskov substitution principle. Moreover it just feels wrong to implement business logic that reflects on types!

What to do then?

Well, how about making your code polymorphic? The logic above could be easily rewritten as:

public abstract class LegalEntity
{
    public virtual Guid Id { get; set; }

    public abstract string Name { get; }
}

public class Person : LegalEntity
{
    public virtual string FirstNames { get; set; }
    public virtual string LastName { get; set; }

    public override string Name
    {
        get { return FirstNames + " " + LastName; }
    }
}

public class Company : LegalEntity
{
    public virtual string CompanyName { get; set; }

    public override string Name
    {
        get { return CompanyName; }
    }
}

public abstract class LegalEntity

{

public virtual Guid Id { get; set; }

public abstract string Name { get; }

}

public class Person : LegalEntity

{

public virtual string FirstNames { get; set; }

public virtual string LastName { get; set; }

public override string Name

{

get { return FirstNames + " " + LastName; }

}

public class Company : LegalEntity

{

public virtual string CompanyName { get; set; }

public override string Name

{

get { return CompanyName; }

}

– which moves the logic of yielding name as a oneliner into the class hierarchy, allowing us to always get a name from a LegalEntity.

What if I really really need a concrete instance?

Then you should use the nifty visitor pattern to extract what you need. In the example above, I would need to add the following additions:

public interface ILegalEntityVisitor
{
    void Visit(Person person);
    void Visit(Company company);
}

public abstract class LegalEntity
{
    // ...

    public abstract void Accept(ILegalEntityVisitor visitor);
}

public class Person : LegalEntity
{
    // ...

    public override void Accept(ILegalEntityVisitor visitor)
    {
        visitor.Visit(this);
    }
}

public class Company : LegalEntity
{
    // ...

    public override void Accept(ILegalEntityVisitor visitor)
    {
        visitor.Visit(this);
    }
}

public interface ILegalEntityVisitor

{

void Visit(Person person);

void Visit(Company company);

}

public abstract class LegalEntity

{

// ...

public abstract void Accept(ILegalEntityVisitor visitor);

}

public class Person : LegalEntity

{

// ...

public override void Accept(ILegalEntityVisitor visitor)

{

visitor.Visit(this);

}

public class Company : LegalEntity

{

// ...

public override void Accept(ILegalEntityVisitor visitor)

{

visitor.Visit(this);

}

This way, we’re taking advantage of the fact that each subclass knows its own concrete instance, thus allowing it to pass itself to the visitor we passed in.

This is the preferred solution when the logic you’re writing doesn’t belong inside the actual entitiy class, like e.g. when you want to convert the entity to an editable view object, because this will make your code break at compile time if someone adds a new specialization, thus requiring each piece of logic to handle that specialization as well.

Oh, and if you’re a Java guy, you might be missing the ability to create an inline anonymous visitor within the scope of the current method, but that can be easily emulated by a generic visitor, like so:

public class LegalEntityVisitor : ILegalEntityVisitor
{
    Action<Person> handlePerson;
    Action<Company> handleCompany;

    public LegalEntityVisitor(Action<Person> handlePerson, Action<Company> handleCompany)
    {
        this.handlePerson = handlePerson;
        this.handleCompany = handleCompany;
    }

    public void Visit(Person person)
    {
        handlePerson(person);
    }

    public void Visit(Company company)
    {
        handleCompany(company);
    }
}

public class LegalEntityVisitor : ILegalEntityVisitor

{

Action<Person> handlePerson;

Action<Company> handleCompany;

public LegalEntityVisitor(Action<Person> handlePerson, Action<Company> handleCompany)

{

this.handlePerson = handlePerson;

this.handleCompany = handleCompany;

}

public void Visit(Person person)

{

handlePerson(person);

}

public void Visit(Company company)

{

handleCompany(company);

}

– which would allow you to write inline typesafe code like this:

double risk = CalculateInitialRisk();

// multiply some number depending on type of legal entity
legalEntity.Accept(new LegalEntityVisitor(person => risk *= GetRiskExperienceForPeople(),
                                          company => risk *= GetRiskExperienceForCompany(company));

double risk = CalculateInitialRisk();

// multiply some number depending on type of legal entity

legalEntity.Accept(new LegalEntityVisitor(person => risk *= GetRiskExperienceForPeople(),

company => risk *= GetRiskExperienceForCompany(company));

Only thing missing now is the ability to return a value in one line depending on the subclass. Well, the generic visitor can be used for that as well by adding the following:

public class LegalEntityVisitor : ILegalEntityVisitor
{
    // ...

    public static TResult Func<TResult>(LegalEntity legalEntity, 
                                        Func<Person, TResult> handlePerson, 
                                        Func<Company, TResult> handleCompany)
    {
        TResult result = default(TResult);

        legalEntity.Accept(new LegalEntityVisitor(p => result = handlePerson(p),
                                                  c => result = handleCompany(c)));

        return result;
    }
}

public class LegalEntityVisitor : ILegalEntityVisitor

{

// ...

public static TResult Func<TResult>(LegalEntity legalEntity,

Func<Person, TResult> handlePerson,

Func<Company, TResult> handleCompany)

{

TResult result = default(TResult);

legalEntity.Accept(new LegalEntityVisitor(p => result = handlePerson(p),

c => result = handleCompany(c)));

return result;

}

– allowing you to write code like this:

public string GetReportTypeCodeFor(LegalEntity legalEntity)
{
    return LegalEntityVisitor.Func(legalEntity, p => "P00000", c => "C" + GetReportingCode(c));
}

public string GetReportTypeCodeFor(LegalEntity legalEntity)

{

return LegalEntityVisitor.Func(legalEntity, p => "P00000", c => "C" + GetReportingCode(c));

}

– and still have the benefit of compile-time safety that all specializations have been handled.

When to reflect on types?

IMO you should only reflect on types in business logic when it’s a shortcut that doesn’t break the semantics of your code. What do I mean by that? Well, e.g. the implementation of the extension method System.Linq.Enumerable.Count<T>() looks something like this:

public int Count<T>(this IEnumerale<T> items)
{
    if (items is ICollection<T>)
    {
        return ((ICollection<T>)items).Count;
    }

    var count = 0;
    // iterate and count manually

    return count;
}

public int Count<T>(this IEnumerale<T> items)

{

if (items is ICollection<T>)

{

return ((ICollection<T>)items).Count;

}

var count = 0;

// iterate and count manually

return count;

}

This way, providing the number of items is accelerated for certain implementations of IEnumerable<T> because the information is already there, and for other types there’s no way to avoid manually counting.

Conclusion

I don’t think I will be using the new lazy="no-proxy" feature, because if I need it, I think it is a sign that my design has a bad smell to it, and I should either go for polymorphism or using a visitor.