Category Archives: snippet

NHibernate is very flexible

…but it does impose limitations on your domain model.

Most of these limitations, however, like the need for public/internal/protected members to be virtual, and the requirement for a default constructor to exist with at least protected accessibility, are not that hard to adhere to and usually don’t interfere with what you would do if there were no rules at all.

One of the limitations, however, can be pretty significant – Ayende describes the problem here, using the term “ghost objects”.

But, as I am about to show, this significance only arises if you follow a certain style of coding, which you should usually avoid!

Short explanation of the problem

When NHibernate lazy-loads an entity from the db (i.e. when you call session.Load<TEntity>(id) or when an entity in your session references something through a lazy-loaded association), it does so by providing an instance of a runtime-generated type, which acts as a proxy.

The first time you access something on the proxy, it gets “hydrated”, which is just a fancy way of saying that the data will be loaded from the database.

This would be fine and dandy, if it weren’t for the fact that the proxy is a runtime-generated subclass of your entity, which – in cases where inheritance is involved – will be a sibling to the other derived classes. Consider the simple inheritance hierarchy on the sketch to the right which in code could be something like so:

public abstract class LegalEntity
{
    public virtual Guid Id { get; set; }
}
 
public class Person : LegalEntity
{
    public virtual string FirstNames { get; set; }
    public virtual string LastName { get; set; }
}
 
public class Company : LegalEntity
{
    public virtual string CompanyName { get; set; }
}

- and then NHibernate will generate something along the lines of this (fake :) ) class signature:

public class LegalEntityProxy1234AndSomeMoreStuff : LegalEntity
{
   // ... secret stuff to access db in here
}

See the problem? Here’s the problem:

var legalEntity = session.Load<LegalEntity>(someKnownId);
Assert.IsTrue(legalEntity is Person
              || legalEntity is Company);  //< AssertionException! will never be Person or Company

This means that this kind of runtime type checking will fail in those circumstances where the entity is a lazy-loaded reference of the supertype, and the following will FAIL:

var legalEntity = session.Load<LegalEntity>(someKnownId);
if (legalEntity is Person)
{
    var person = (Person) legalEntity;
    return person.FirstNames + " " + person.LastName;
}
else
{
    // we know it's a company then, right? WRONG!
    var company = (Company) legalEntity;  //< InvalidCastException!
    return company.CompanyName;
}
One possible solution

The other day, Ayende blogged about a recent addition to NHibernate, namely lazy-loaded properties. This allows an entity to be partially hydrated, intercepting calls to certain properties to lazy-load the relevant fields on demand.

This feature is great when storing LOBs alongside the other fields on an entity, but it also laid the ground for his most recent addition, which is the ability to lazy-load an association by setting lazy="no-proxy" on it.

This way, NHibernate will not build a proxy, but instead it will intercept the property getter and load the entity at that point in time, thus being able to return the exact (sub)type of the loaded entity.

Now this seems to solve our problems, but let’s zoom out a bit … why did we have a problem in the first place? Our problem was actually that we failed to write object-oriented code, but instead we wrote a brittle piece of code that would fail at runtime whenever someone added a new subtype, thus violating the Liskov substitution principle. Moreover it just feels wrong to implement business logic that reflects on types!

What to do then?

Well, how about making your code polymorphic? The logic above could be easily rewritten as:

public abstract class LegalEntity
{
    public virtual Guid Id { get; set; }
 
    public abstract string Name { get; }
}
 
public class Person : LegalEntity
{
    public virtual string FirstNames { get; set; }
    public virtual string LastName { get; set; }
 
    public override string Name
    {
        get { return FirstNames + " " + LastName; }
    }
}
 
public class Company : LegalEntity
{
    public virtual string CompanyName { get; set; }
 
    public override string Name
    {
        get { return CompanyName; }
    }
}

- which moves the logic of yielding name as a oneliner into the class hierarchy, allowing us to always get a name from a LegalEntity.

What if I really really need a concrete instance?

Then you should use the nifty visitor pattern to extract what you need. In the example above, I would need to add the following additions:

public interface ILegalEntityVisitor
{
    void Visit(Person person);
    void Visit(Company company);
}
 
public abstract class LegalEntity
{
    // ...
 
    public abstract void Accept(ILegalEntityVisitor visitor);
}
 
public class Person : LegalEntity
{
    // ...
 
    public override void Accept(ILegalEntityVisitor visitor)
    {
        visitor.Visit(this);
    }
}
 
public class Company : LegalEntity
{
    // ...
 
    public override void Accept(ILegalEntityVisitor visitor)
    {
        visitor.Visit(this);
    }
}

This way, we’re taking advantage of the fact that each subclass knows its own concrete instance, thus allowing it to pass itself to the visitor we passed in.

This is the preferred solution when the logic you’re writing doesn’t belong inside the actual entitiy class, like e.g. when you want to convert the entity to an editable view object, because this will make your code break at compile time if someone adds a new specialization, thus requiring each piece of logic to handle that specialization as well.

Oh, and if you’re a Java guy, you might be missing the ability to create an inline anonymous visitor within the scope of the current method, but that can be easily emulated by a generic visitor, like so:

public class LegalEntityVisitor : ILegalEntityVisitor
{
    Action<Person> handlePerson;
    Action<Company> handleCompany;
 
    public LegalEntityVisitor(Action<Person> handlePerson, Action<Company> handleCompany)
    {
        this.handlePerson = handlePerson;
        this.handleCompany = handleCompany;
    }
 
    public void Visit(Person person)
    {
        handlePerson(person);
    }
 
    public void Visit(Company company)
    {
        handleCompany(company);
    }
}

- which would allow you to write inline typesafe code like this:

double risk = CalculateInitialRisk();
 
// multiply some number depending on type of legal entity
legalEntity.Visit(new LegalEntityVisitor(person => risk *= GetRiskExperienceForPeople(),
                                         company => risk *= GetRiskExperienceForCompany(company));

Only thing missing now is the ability to return a value in one line depending on the subclass. Well, the generic visitor can be used for that as well by adding the following:

 
public class LegalEntityVisitor : ILegalEntityVisitor
{
    // ...
 
    public static TResult Func<TResult>(LegalEntity legalEntity, 
                                        Func<Person, TResult> handlePerson, 
                                        Func<Company, TResult> handleCompany)
    {
        TResult result = default(TResult);
 
        legalEntity.Visit(new LegalEntityVisitor(p => result = handlePerson(p),
                                                 c => result = handleCompany(c)));
 
        return result;
    }
}

- allowing you to write code like this:

 
public string GetReportTypeCodeFor(LegalEntity legalEntity)
{
    return LegalEntityVisitor.Func(legalEntity, p => "P00000", c => "C" + GetReportingCode(c));
}

- and still have the benefit of compile-time safety that all specializations have been handled.

When to reflect on types?

IMO you should only reflect on types in business logic when it’s a shortcut that doesn’t break the semantics of your code. What do I mean by that? Well, e.g. the implementation of the extension method System.Linq.Enumerable.Count() looks something like this:

public int Count<T>(this IEnumerale<T> items)
{
    if (items is ICollection<T>)
    {
        return ((ICollection<T>)items).Count;
    }
 
    var count = 0;
    // iterate and count manually
 
    return count;
}

This way, providing the number of items is accelerated for certain implementations of IEnumerable<T> because the information is already there, and for other types there’s no way to avoid manually counting.

Conclusion

I don’t think I will be using the new lazy="no-proxy" feature, because if I need it, I think it is a sign that my design has a bad smell to it, and I should either go for polymorphism or using a visitor.

C# vs. Clojure vs. Ruby & Scala

Short preface: at a job interview, Zach Cox was told to aggregate words and word counts from a bunch of files into two files, sorted alphabetically and by word count respectively, which he did in Ruby and Scala. This led Lau Bjørn Jensen to do the same thing in Clojure, which apparantly sparked other people to do it in Java, Python etc.

Inspired by the afore mentioned problem, and an extended train ride home (thank you, Danish National Railways!!), I decided to see what a C# (v. 3) version could look like:

namespace NewsReader
{
  using System;
  using System.IO;
  using System.Linq;
  using System.Text.RegularExpressions;
  using System.Diagnostics;
 
  class Program
  {
    static void Main()
    {
      const string dir = @"c:\temp\20_newsgroups";
      var stopwatch = Stopwatch.StartNew();
      var regex = new Regex(@"\w+", RegexOptions.Compiled);
 
      var list = (from filename in Directory.GetFiles(dir, "*.*", SearchOption.AllDirectories)
                  from match in regex.Matches(File.ReadAllText(filename).ToLower()).Cast<Match>()
                  let word = match.Value
                  group word by word
                  into aggregate
                    select new
                             {
                               Word = aggregate.Key,
                               Count = aggregate.Count()  ,
                               Text = string.Format("{0}\t{1}", aggregate.Key, aggregate.Count())
                             })
        .ToList();
 
      File.WriteAllLines(@"words-by-count.txt", list.OrderBy(c => c.Count).Select(c => c.Text).ToArray());
      File.WriteAllLines(@"words-by-word.txt", list.OrderBy(c => c.Word).Select(c => c.Text).ToArray());
 
      Console.WriteLine("Elapsed: {0:0.0} seconds", stopwatch.Elapsed.TotalSeconds);
    }
  }
}

Weighing in at 36 lines and executing in 10.2 seconds (on my Intel Core 2 laptop with 4 GB RAM), I think this is a pretty clear and performant alternative to the other languages mentioned.

Tailoring a custom matcher for NMock

You’ll often hear proponents of test-driven development claim that unit testing is hard and that it forces them to open up their classes’ hidden logic for them to be testable. That might be true to some degree, but more often in my opinion you’ll find that designing your system to be testable also has the nifty side-effect of separating your logic into .. umm logical chunks… and what I really mean here is that your chunks have a tendency to become orthogonal, which is by far one of the best quality attributes of a system.

BUT that was not what I was going to say, actually – I just wanted to comment on a nifty thing, I recently found out: implementing my own Matcher to use with NMock.

An NMock Matcher is an abstract class, which requires you to implement the following members:

void DescribeTo(TextWriter writer);
bool Matches(object o);

A matcher can be used in the call to With(...) when stubbing or expecting… an example could be setting an expectation that a search function on our user repository will return an empty result… like so:

Expect.Once.On(userRepository)
    .Method("SearchByArbitraryString")
    .With(Is.StringContaining("what I just wrote"))
    .Will(Return.Value(new List<IUser>()));

In the example above, Is is a class containing a static method StringContatining, which returns a matcher of a certain type. Now, when the test runs, and NMock needs to decide if an intercepted function call matches the expectation above, it will iterate though the given matchers, and call their Matches function, passing to it the actual argument as object o.

The matcher returned by StringContaining probably contains an implementation of Matches which looks something like this:

public override bool Matches(object o)
{
    return o is string && ((string)o).Contains(_substring);
}

where _substring was probably set in the ctor of the matcher when it was constructed by the StringContaining function.

Now that leads me to my recent problem: I needed to pull out the actual argument given when the expected function call was performed. In my case the argument was a delegate, which I wanted to pull out, and then invoke it a few times with different arguments.

What I did was this:

public class SnatchArgMatcher<T> : Matcher where T : class
{
  T obj;
 
  public T Object
  {
    get { return obj; }
  }
 
  public override void DescribeTo(TextWriter writer)
  {
    writer.WriteLine(obj == null ? "(null)" : obj.ToString());
  }
 
  public override bool Matches(object o)
  {
    if (!(o is T))
      throw new ArgumentException(
        string.Format("{0} is not of type {1}",
                      o == null ? "(null)" : o.ToString(),
                      typeof (T).Name));
    obj = (T)o;
 
    return true;
  }
}

This allows me to set up an expectation like this:

var snatcher = new SnatchArgMatcher<ForEachFileHandler>();
Expect.Once.On(fileService)
    .Method("ForEachFile")
    .With(snatcher);

Now, after the code has been run, I have access to the delegate passed to the file service through snatcher.Object.

If someone knows a cooler way to do this, please do post a comment below. Until then, I will continue to think that it was actually pretty nifty.

Using compression at field level when persisting

One day at work, someone needed to store a lot of text in a field of a domain object – and since we are using NHibernate for persistence, that field would get stored to the database, taking up a huge amount of space.

To circumvent this, we just made the actual type of the member variable description be a byte[], and then we let the accessor property Description zip and unzip when accessing the value.

Like so:

    readonly Encoding Enc = Encoding.UTF8;
 
    public string Description
    {
      get
      {
        using(MemoryStream unzippedStream = new MemoryStream(description))
        using (Stream zip = new GZipStream(unzippedStream, CompressionMode.Decompress))
        using (StreamReader reader = new StreamReader(zip, Enc))
        {
          return reader.ReadToEnd();
        }
      }
      private set
      {
        using (MemoryStream zippedStream = new MemoryStream())
        {
          using (Stream zip = new GZipStream(zippedStream, CompressionMode.Compress))
          {
            byte[] toWrite = Enc.GetBytes(value);
            zip.Write(toWrite, 0, toWrite.Length);
          }
          description = zippedStream.ToArray();
        }
      }
    }

It should be noted that it is crucial that the ToArray() method of the compressed MemoryStream be called after the GZipStream stream has been disposed! That’s because the GZipStream will not write the gzip stream footer bytes until the stream is disposed. So if ToArray() is called before disposing, you will get an incomplete stream of bytes.

Moreover it should be noted that zipping strings in the database is not always cool because:

  1. Compression kicks in for strings at around 2-300 chars. The size of the compressed data is greater than that of the original string for shorter strings.
  2. Querying is impossible/hard/weird. :-)

But it is a nifty little trick to put in one’s backpack, and I had all sorts of trouble figuring out exactly how to make the GZipStream behave, so I figured it would be nice to put a working example on the net.