Using compression at field level when persisting

One day at work, someone needed to store a lot of text in a field of a domain object – and since we are using NHibernate for persistence, that field would get stored to the database, taking up a huge amount of space.

To circumvent this, we just made the actual type of the member variable description be a byte[], and then we let the accessor property Description zip and unzip when accessing the value.

Like so:

It should be noted that it is crucial that the ToArray() method of the compressed MemoryStream be called after the GZipStream stream has been disposed! That’s because the GZipStream will not write the gzip stream footer bytes until the stream is disposed. So if ToArray() is called before disposing, you will get an incomplete stream of bytes.

Moreover it should be noted that zipping strings in the database is not always cool because:

  1. Compression kicks in for strings at around 2-300 chars. The size of the compressed data is greater than that of the original string for shorter strings.
  2. Querying is impossible/hard/weird. 🙂

But it is a nifty little trick to put in one’s backpack, and I had all sorts of trouble figuring out exactly how to make the GZipStream behave, so I figured it would be nice to put a working example on the net.

One thought on “Using compression at field level when persisting

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.