One day at work, someone needed to store a lot of text in a field of a domain object – and since we are using NHibernate for persistence, that field would get stored to the database, taking up a huge amount of space.
To circumvent this, we just made the actual type of the member variable description be a byte[], and then we let the accessor property Description zip and unzip when accessing the value.
Like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
readonly Encoding Enc = Encoding.UTF8; public string Description { get { using(MemoryStream unzippedStream = new MemoryStream(description)) using (Stream zip = new GZipStream(unzippedStream, CompressionMode.Decompress)) using (StreamReader reader = new StreamReader(zip, Enc)) { return reader.ReadToEnd(); } } private set { using (MemoryStream zippedStream = new MemoryStream()) { using (Stream zip = new GZipStream(zippedStream, CompressionMode.Compress)) { byte[] toWrite = Enc.GetBytes(value); zip.Write(toWrite, 0, toWrite.Length); } description = zippedStream.ToArray(); } } } |
It should be noted that it is crucial that the ToArray() method of the compressed MemoryStream be called after the GZipStream stream has been disposed! That’s because the GZipStream will not write the gzip stream footer bytes until the stream is disposed. So if ToArray() is called before disposing, you will get an incomplete stream of bytes.
Moreover it should be noted that zipping strings in the database is not always cool because:
- Compression kicks in for strings at around 2-300 chars. The size of the compressed data is greater than that of the original string for shorter strings.
- Querying is impossible/hard/weird. 🙂
But it is a nifty little trick to put in one’s backpack, and I had all sorts of trouble figuring out exactly how to make the GZipStream behave, so I figured it would be nice to put a working example on the net.