Way more nifty: Using compression at field level when persisting #2

In a previous post (here) I showed how the contents of a string property could be internally stored as a GZipped byte array for persistence. The code demonstrated how to achieve this, but not in a particularly pretty manner, because I did some nasty hacks inside the property’s setter and getter – i.e. I crammed all the zipping/unzipping stuff directly inside my domain class.

When using NHibernate for persistence you have another option, which I have recently learned about. So let’s rewrite the example from my previous post.

The example looked like this:

    readonly Encoding Enc = Encoding.UTF8;
 
    public string Description
    {
      get
      {
        using(MemoryStream unzippedStream = new MemoryStream(description))
        using (Stream zip = new GZipStream(unzippedStream, CompressionMode.Decompress))
        using (StreamReader reader = new StreamReader(zip, Enc))
        {
          return reader.ReadToEnd();
        }
      }
      private set
      {
        using (MemoryStream zippedStream = new MemoryStream())
        {
          using (Stream zip = new GZipStream(zippedStream, CompressionMode.Compress))
          {
            byte[] toWrite = Enc.GetBytes(value);
            zip.Write(toWrite, 0, toWrite.Length);
          }
          description = zippedStream.ToArray();
        }
      }
    }

As you can see there’s way too much going on in there – we want a persistence ignorant domain model, so we want to go back to the property looking simple, like this:

    public string Description
    {
      get { return description; }
      set { description = value; }
    }

- and we can actually achieve that, because NHibernate is so freaking cool that I almost cannot believe it. We do it by implementing our own type of database mapping – an IUserType. An implementation of IUserType tells NHibernate the following useful stuff:

  • What is the Sql type of the database column
  • What is the .NET type we want to be able to store/retrieve
  • + some more stuff

- and in addition to this, the implementor must implement the actual object -> db/db -> object mapping.

I implemented the example like this:

  public class ZippedString : IUserType
  {
    readonly Encoding Enc = Encoding.UTF7;
 
    public bool Equals(object x, object y)
    {
      return string.Equals(x, y);
    }
 
    public int GetHashCode(object x)
    {
      return x.GetHashCode();
    }
 
    public object NullSafeGet(IDataReader rs, string[] names, object owner)
    {
      object obj = NHibernateUtil.BinaryBlob.NullSafeGet(rs, names[0]);
      if (obj == null) return null;
      byte[] bytes = (byte[]) obj;
 
      return UnzipString(bytes);
    }
 
    public void NullSafeSet(IDbCommand cmd, object value, int index)
    {
      NHibernateUtil.BinaryBlob.NullSafeSet(cmd, ZipString((string)value), index);
    }
 
    public object DeepCopy(object value)
    {
      return value;
    }
 
    public SqlType[] SqlTypes
    {
      get { return new SqlType[]{NHibernateUtil.BinaryBlob.SqlType}; }
    }
 
    public Type ReturnedType
    {
      get { return typeof(string); }
    }
 
    public bool IsMutable
    {
      get { return false; }
    }
 
    object ZipString(string str)
    {
      using (MemoryStream zippedStream = new MemoryStream())
      {
        using (Stream zip = new GZipStream(zippedStream, CompressionMode.Compress))
        {
          byte[] toWrite = Enc.GetBytes(str);
          zip.Write(toWrite, 0, toWrite.Length);
        }
        return zippedStream.ToArray();
      }
    }
 
    string UnzipString(byte[] bytes)
    {
      using (MemoryStream unzippedStream = new MemoryStream(bytes))
      using (Stream zip = new GZipStream(unzippedStream, CompressionMode.Decompress))
      using (StreamReader reader = new StreamReader(zip, Enc))
      {
        return reader.ReadToEnd();
      }
    }
  }

- and now my mapping file contains the following mapping:

<property name="Description" 
          column="description"
          length="6400"
          type="SomeProject.Repos.UserTypes.ZippedString, SomeProject.Repos" />

Note how the type is set to our newly implemented user type.

The advantages over my previous solution include (but are not necessarily limited to):

  1. Your domain model stays PI. Annoying details, like how to actually get away with storing stuff in the database can be put in a place where is does not disturb your eyes and colleagues forever.
  2. You avoid implementing this kind of logic multiple times.
  3. And the corollary to 2: If you need to change the implementation some time, you need only change it in one place.
  4. Debugging is way more fun because the field is the same type as the property.
  5. Is is more efficient because the actual mapping is only performed when storing/retrieving values (which is way less frequent than you’d think because of NHibernate’s built-in caching).

As you can see, this implementation was fairly simple. One disadvantage though, compared to storing the actual string, is that querying is still impossible.