Book review: NHibernate 3.0 Cookbook

NHibernate 3.0 Cookbook coverPackt Publishing has before asked if I was interested in reviewing some of their books, which I was – this time, however, I asked them if they were interested in me reviewing their upcoming NHibernate 3.0 Cookbook by Jason Dentler.

I did that, because I like NHibernate very much, and I would like to help promote good fresh litterature about the subject. And this book really stands out as fresh, because it covers NHibernate 3.0 which has not even been officially released yet!

My first impressions are good – it starts out with creating a model and the usual XML-mapping stuff, and then it dives directly into modeling an inheritance hierarchy. I think this is pretty cool, because it is a sign that the book has a fairly high level of ambition: It is not just about stuffing away rows in the db, it’s about persisting an actual model!

It covers Fluent NHibernate and Fabio’s ConfORM as well, so it provides a really good foundation to anyone interested in learning the intricacies of configuring NHibernate. And it is pretty true to the model-first approach, which is how I like it.

Then it goes on with a chapter on how to manage sessions and transactions including – among other things – an example on how to manage the session from an ASP.NET MVC action filter (which is not “best practice” from an ASP.NET MVC perspective IMO, as it relies on static gateways, but I digress… the book is not about ASP.NET MVC :))

The query chapter is great, because it covers everything I can think of: Criteria, QueryOver, HQL, both in their normal and multi forms, futures, LINQ to NHibernate, detached criteria, and the new HQL bulk operations, insert and update. If I must put my finger on something, I think that the different areas are covered a little too lightly, but hey – there’s plenty of information on this stuff on the internet, and you could probably write an entire book entirely about how to put HQL to use.

The testing chapter is great as well, as it touches on nice-to-know stuff and some of “the new developments” in the area: NHibernate Profiler, Fluent NHibernate automatic persistence testing, using in-memory SQLite for persistence testing, + more.

The chapter on implementing a data access layer shows a typical data access object and a repository implementation which will probably look familiar to a lot of people, implemented with NHibernate. They both have the ability to automatically perform their operations withing transactions, if one is not already active. This makes the implementations pretty flexible, as they can be used either “by themselves”, or they can implicitly enlist in an ongoing unit of work. Moreover, a pretty nifty named query implementation is shown, complete with automated test that checks whether all implemented named query classes have corresponding named queries in an HBM XML file.

The rest of the book shows how various common tasks can be achieved using NHibernate or some of the many NHContrib projects, like e.g. creating an audit trail by listening to events, creating an IUserType to encrypt strings, using Burrows to manage session, putting NHibernate Search to use, etc… As you can probably imagine, this stuff is covered pretty lightly, but it’s sufficient to give an impression on the huge ecosystem that surrounds NHibernate, which is great.

Conclusion

It strikes me that the book is definitely a “no BS-book” – there’s plenty of code, which is mostly high quality and sufficiently best practice-compliant, and recommendations throughout when there are decisions to be made. If I should criticize something, I think the sheer amount of code makes for an exhausting casual read πŸ™‚ it does, however, claim to be a “cookbook”, so I guess that’s just the way it is.

The book is probably great for developers, who are either new to or semi-experienced in using NHibernate, but have a general high level of experience and skills.

All-in-all a good read, and it’s great that it touches on so many things in and around this huge framework!

Title: NHibernate 3.0 Cookbook
Author: Jason Dentler
ISBN 10/13: 184951304X / 978-1-84951-304-3
Publisher: Packt Publishing

More checking out MongoDB: Updating

In MongoDB, there’s no way to lock a database, collection, or document. The ability to work without locking is a requirement for any db that wishes to be horizontally scalable, and obviously this imposes some limitations and/or possibilities (depending on your point of view :)).

If you want all the goodness that document-orientation brings, it seems we need to cope with this non-locking database.

So how DO you update stuff in MongoDB? And, more importantly: how do you update stuff without race conditions?

In one of my previous posts on MongoDB, I mentioned that the unit of atomicity is a document – i.e., either a document gets saved/updated/deleted or it doesn’t. That must mean that we can count on updating one document only (or not), so we should build our applications so they can work without requiring multiple documents to be updated to be consistent ([1. Which is good practice anyway! In my experience, long and wide db transactions are often used, not to enforce a strict consistency as much as to allow scenarios like: “when this happens, this should also happen”. But that kind of logic can often be handled by something else, e.g. by a publishing events reliably to other processes (logically and/or physically), that handles the side-effects.]).

First, let’s take a look at how to actually update a document.

NaΓ―ve attempt to update a document

Well, we could do this:

That is, if you go and save a document that already has an ID, any existing document with that ID will be updated.

This would work if we were the only client on the db. But what if someone was editing the post in that same moment, adding another tag as well? Well, if he was unfortunate enough to save his edits when we were out for coffee, his changes would be lost.

One way to actually do it

By using the update function!

update accepts the following four arguments:

  1. criteria – document selector that specifies which document to be updated
  2. objNew – document to save
  3. upsert – bool to specify auto-insert if document does not exist (“update if present, insert if missing”)
  4. multi – bool to allow updating multiple documents that match the criteria (default is only first document)

Actually, as you can now probably see, save(doc) is just a shorthand for update({}, doc, true, false) – an upsert with the document we’re saving.

This way, we could easily add an incrementing version field to our documents to make sure that the version we’re saving is the version we retrieved.

Let’s try it out:

– good thing he didn’t get through with that one! πŸ™‚

As you can see, we can easily implement optimistic concurrency on each document by constraining updates to the version we checked out. But, as I will show next, you can actually do a lot of things on the server.

Server-side document updates

Instead of retrieving an entire document, modifying it, and saving it back again with the risk of overwriting someone else’s edits, we can ask the server to make edits as smaller operations. E.g. our attempt to add a missing ‘test’ tag to the document could have been done like this:

See how the code $push modifier was used to push a value into the array… this stuff is great. But here, we have a race condition again – what if someone added the ‘test’ tag almost the same time as we did? Then two ‘test’ tags would be present in the array.

One way is to constrain the update by id and the absence of ‘test’ in the tags array:

– another is to use the $addToSet function, which makes MongoDB treat the array as a set:

Nifty!

My conclusion (so far) is that an application can get a huge benefit from using the various modifier operations – performance-wise (obviously), but probably also UX-wise as well… It’s a step in another direction from the usual CRUD scenarios that I usually compulsively associate with the word “update”, and I imagine it could be made to reflect the user’s interactions with the system.

I am thinking that the majority of the user’s interactions with the system could (and probably should) be put on the form

– and then one could implement a more-or-less generic mechanism on the client side to handle unsuccessful updates (e.g. by asking the user what to do then, reloading some data to allow [assumed state before] to be something else, or by handling certain situations with some kind of merging function, etc.).

How to perform updates across multiple documents

My first thought is that this situation should be avoided when working with a document-oriented db. I think most people will agree with this one.

I am pretty unsure of this, actually… The rest of this post is just a few thoughts on my first take on this, should I need to do this. Comments are greatly appreciated!

The problem in updating multiple documents is that we can perform an update on one document at a time, each time checking if the update went well or not. But there’s no way to (consistently) roll back update #1 if update #2 fails. So this means that there’s only one way: Forward! But how to proceed then, when an update failed?

How do we usually do stuff reliably across boundaries of multiple things that may or may not succeed, allowing us to handle errors as gracefully as possible and proceed thereafter?

I’m thinking that asynchronous reliable one-way messaging is the answer to this.

So if an application ever needs to update multiple documents in the most reliable way possible, it should probably perform one document update per “transaction” – in NServiceBus terminology that would be updating one document per message handler. And then the handler should throw an exception if an update unexpectedly fails.

But again: I’m thinking that this situation should be avoided at all costs with a document-oriented db. If ACID is required, the application should probably have a RDBMS on the side, or implement some kind of transactional mechanism in the document store.

Conclusion

That concludes my little learning series of MongoDB posts.

I must say that I am intrigued by all the NoSQL discussions currently going on in the communities, and I think it is always a sign of health that we question the technologies we use.

I am entirely convinced that document dbs could and should have been used for some parts of systems that I have experienced, and I am blown away by the lack of friction when starting up a project on top of a schemaless db.

As a .NET dude, I am convinced that the future will see more .NET systems built with more than one db underneath – e.g. with MongoDB for all the “soft parts” of the system, NHibernate on SQL Server for the few things that by nature require ACID, and then some NHibernate Search/Lucene/Solr.NET for full-text indexing and searching capabilities etc.

NHibernate is very flexible

…but it does impose limitations on your domain model.

Most of these limitations, however, like the need for public/internal/protected members to be virtual, and the requirement for a default constructor to exist with at least protected accessibility, are not that hard to adhere to and usually don’t interfere with what you would do if there were no rules at all.

One of the limitations, however, can be pretty significant – Ayende describes the problem here, using the term “ghost objects”.

But, as I am about to show, this significance only arises if you follow a certain style of coding, which you should usually avoid!

Short explanation of the problem

When NHibernate lazy-loads an entity from the db (i.e. when you call session.Load<TEntity>(id) or when an entity in your session references something through a lazy-loaded association), it does so by providing an instance of a runtime-generated type, which acts as a proxy.

The first time you access something on the proxy, it gets “hydrated”, which is just a fancy way of saying that the data will be loaded from the database.

This would be fine and dandy, if it weren’t for the fact that the proxy is a runtime-generated subclass of your entity, which – in cases where inheritance is involved – will be a sibling to the other derived classes. Consider the simple inheritance hierarchy on the sketch to the right which in code could be something like so:

– and then NHibernate will generate something along the lines of this (fake :)) class signature:

See the problem? Here’s the problem:

This means that this kind of runtime type checking will fail in those circumstances where the entity is a lazy-loaded reference of the supertype, and the following will FAIL:

One possible solution

The other day, Ayende blogged about a recent addition to NHibernate, namely lazy-loaded properties. This allows an entity to be partially hydrated, intercepting calls to certain properties to lazy-load the relevant fields on demand.

This feature is great when storing LOBs alongside the other fields on an entity, but it also laid the ground for his most recent addition, which is the ability to lazy-load an association by setting lazy="no-proxy" on it.

This way, NHibernate will not build a proxy, but instead it will intercept the property getter and load the entity at that point in time, thus being able to return the exact (sub)type of the loaded entity.

Now this seems to solve our problems, but let’s zoom out a bit … why did we have a problem in the first place? Our problem was actually that we failed to write object-oriented code, but instead we wrote a brittle piece of code that would fail at runtime whenever someone added a new subtype, thus violating the Liskov substitution principle. Moreover it just feels wrong to implement business logic that reflects on types!

What to do then?

Well, how about making your code polymorphic? The logic above could be easily rewritten as:

– which moves the logic of yielding name as a oneliner into the class hierarchy, allowing us to always get a name from a LegalEntity.

What if I really really need a concrete instance?

Then you should use the nifty visitor pattern to extract what you need. In the example above, I would need to add the following additions:

This way, we’re taking advantage of the fact that each subclass knows its own concrete instance, thus allowing it to pass itself to the visitor we passed in.

This is the preferred solution when the logic you’re writing doesn’t belong inside the actual entitiy class, like e.g. when you want to convert the entity to an editable view object, because this will make your code break at compile time if someone adds a new specialization, thus requiring each piece of logic to handle that specialization as well.

Oh, and if you’re a Java guy, you might be missing the ability to create an inline anonymous visitor within the scope of the current method, but that can be easily emulated by a generic visitor, like so:

– which would allow you to write inline typesafe code like this:

Only thing missing now is the ability to return a value in one line depending on the subclass. Well, the generic visitor can be used for that as well by adding the following:

– allowing you to write code like this:

– and still have the benefit of compile-time safety that all specializations have been handled.

When to reflect on types?

IMO you should only reflect on types in business logic when it’s a shortcut that doesn’t break the semantics of your code. What do I mean by that? Well, e.g. the implementation of the extension method System.Linq.Enumerable.Count() looks something like this:

This way, providing the number of items is accelerated for certain implementations of IEnumerable<T> because the information is already there, and for other types there’s no way to avoid manually counting.

Conclusion

I don’t think I will be using the new lazy="no-proxy" feature, because if I need it, I think it is a sign that my design has a bad smell to it, and I should either go for polymorphism or using a visitor.

Fluent NHibernate automapping and cascading

One of the things you usually end up wanting to configure on a case-to-case basis when using NHibernate is cascading – i.e. which relations should NHibernate take responsiblity for saving.

An example could be in DDD terms where you have an aggregate root, that contains some stuff. Let’s take an example with a Band which is the subject of the BandMember role, which in turn references another aggregate root, User, that is the object of the band member role.

It is OK for us that we need to save a User and a Band in a repository, that’s the idea when dealing with aggregate roots. But everything beneath an aggregate root should be automatically persisted, and the root should be capable of creating/deleting stuff beneath itself without any references to repositories and stuff like that.

So how do we do that with NHibernate? Well, it just so happens that NHibernate can be configured to cascade calls to save, delete, etc. through relations, thus allowing the aggregate root to logically “contain” its entities.

This is all fine and dandy, but when Fluent NHibernate is doing its automapping, we need to be able to give some hints when we want cascading to happen. I usually want to be pretty persistence ignorant, BUT sometimes I just want to be able to do stuff quickly and just get stuff done, so I usually end up “polluting” my domain model with a few attributes that give hints to a convention I use.

Consider this:

Notice that little [Cascade]-thingie in there? It’s implemented like this:

Trivial – but I want that to be spotted by Fluent NHibernate to make the BandMembers collection into a cascade="all-delete-orphan", which in turn will cause the methods AddBandMember and RemoveBandMember to be able to update the DB.

I do this with a CascadeConvention, which is my implementation of the IHasManyConvention and IReferenceConvention interfaces. It looks like this:

What is left now is to make sure Fluent NHibernate picks up my convention and uses it. I usually do this by throwing them into the same assembly as my entities and do something like this when configuring FNH:

– which will cause all conventions residing in the same assembly as EntityBase to be used.

This works really good for me, because it makes it really easy and quick to configure cascading where it’s needed – I don’t have to look deep into an .hbm.xml file somewhere or try to figure out how cascading might be configured somewhere else – the configuration is right where it’s relevant and needed.

Extremist PI-kinds-of-guys might not want to pollute their domain models with attributes, so they might want to use another way of specifying where cascading should be applied. Another approach I have tinkered with, is to let all my entities be subclasses of an appropariate base class – like e.g. AggregateRoot (implies no cascading), Aggregate (implies cascading), and Component (implies that the class is embedded in the entity).

The great thing about Fluent NHibernate is that it’s entirely up to you to decide what kind of intrusion offends you the least πŸ™‚

Getting started with Fluent NHibernate

Why

I have loved NHibernate right from when I first learned how to use it, and I have loved NHibernate beneath Castle ActiveRecord since I found out that ActiveRecord could reduce the pain I felt because of NHibernate’s .hbm.xmhell.

However, using Castle ActiveRecord requires you to “pollute” your otherwise beautiful and pure domain model with database mapping metadata in the form of .NET attributes. If you are a PI kind of kind, this might be too much to ask. But if you (like me) are willing to relax your ideals a bit to gain the extra productivity, you might be happy to do so – even though it hurts your feelings a little bit every time you have to punch in that [Property(Access == PropertyAccess.FieldCamelCase)]-attribute for the 100th time…

But there is another option: Fluent NHibernate – an option that embraces the philosophy of “convention over configuration”, which seems to be getting a lot of attention these days in the .NET world.

EDIT: Oh yeah, and now it’s here.

How

If you know NHibernate, you know that you use it like this:

  1. Create a Configuration.
  2. Use the configuration to build an ISessionFactory which you store somewhere.
  3. And then use that session factory repeatedly to open ISessions which in turn are used to do stuff with the DB.

So, what can Fluent NHibernate do for you? Well, FNH is actually pretty simple – it helps with the Configuration part of NHibernate. As soon as the Configuration is finished, Fluent NHibernate is out, and from then on it’s pure NHibernate again.

First, I will show a short example – and then I will explain how much is actually accomplished by issuing these ridiculously simple statements. Behold:

If you look at the statements above, I can tell you that it configures FNH to assume the following:

  • All of my entities can be found in the assembly containing Base inside any namespace containing the string “Entities”.
  • All of my entities derive from Base, which is just a very simple entity layer supertype containing nothing more than an Id property of type Guid.
  • Any conventions found in the same assembly as Base will be used by FNH to alter the mappings.

Is that it? Well almost… The code above will make FNH map everything as good as it can, but I still want to alter the mapping in certain places – and that is done through the conventions found in my entity asseembly. At the moment I have 2 conventions:

  1. CascadeAttributeConvention, which is an IHasManyConvention and an IReferenceConvention that looks over any one-to-many and many-to-one relations in my domain, setting cascade to all or all-delete-orphan if it finds a CascadeAttribute on the property.

    This allows me to do stuff like

    and expect the collection of band members to live and die and be updated with its aggregate, Band.

  2. PluralizeTableNamesConvention, which is an implementation of IClassConvention I use to pluralize table names. It contains code like this:

    where the Pluralize method does whichever magic is required to pluralize names from my domain (usually by appending an “s”: Band => Bands, BandMember => BandMembers etc.).

And that is it!

I expect a few more conventions to come along as development progresses – e.g. in other projects I have run into the need to be able to specify that one string should be nvarchar(255) and another string nvarchar(8192) in the DB, which I have solved with an IPropertyConvention that looks for the presence of a LengthAttribute containing the desired length.

And that is why I really like Fluent NHibernate – it allows me to leverage the most powerful ORM alive in the easiest possible way, making NHibernate feel lean and lightweight, encouraging me to follow conventions in my codebase. What’s not to like?

Customizing the NHibernate mapping when using the AutoPersistenceModel of Fluent NHibernate

EDIT: Please note that this way of configuring Fluent NHibernate is obsolete since the RCs. Look here for an example on how to customize the automapping on Fluent NHibernate 1.0 RTM.

Fluent NHibernate is cool because there is too much XML in the world – so I appreciate any initiative to bring down the amount of XML. Configuring NHibernate using simple code and strongly types lambdas is actually pretty cool.

However, there is also too much code in the world – therefore, the AutoPersistenceModel is my preferred way of configuring Fluent NHibernate. It allows me to focus on what really matters: my domain model.

But don’t you lose control when all your database mapping is done automatically adhering only to a few conventions? No! You can always customize your mappings and complement/override the auto mappings. But then we can easily end up customizing too much, and then we are back to writing too much code.

Therefore, I would like to show how I use attributes to spice up my NHibernate mapping and add that tiny bit of extra customization that I need. I would like to be able to make the following entity:

That is, I want to be able to describe some extra mapping related stuff by applying a few non-instrusive attributes in my domain model.

How can that be accomplished with Fluent NHibernate? Easy! Like so:

and then I have made the following attributes:

IndexedAttribute.cs:

UniqueAttribute.cs (works with ordinary as well as composite unique constraints):

LengthAttribute.cs:

EncryptedAttribute.cs:

If you are an extremist PI kinda guy you might think this is wrong – and I might agree with you a little, but I am also a pragmatic kinda guy, and I think this is a great compromise between being fairly non-intrusive but still expressive, effective and clear.

Why I use Guids for primary keys all the time

Fabio Maulo explains and shows the difference between using Guid, Hilo and native primary keys with NHibernate and the batcher: NH2.1.0: generators behavior explained

Reallly insightful post (as always), and this one is mandatory reading if you

  1. are using NHibernate
  2. are not using NHibernate because you think you think it performs badly compared to your own hand-rolled ADO.NET data layer

I claim (without any empirical knowledge besides my own experience) that almost all db access in reality performs much better with NHibernate because of batching and futures. You CAN outperform NHibernate in synthetic benchmarks and stress tests, but for real-world usage, optimizing and doing your hand-rolled ADO.NET thing is far too cumbersome. Not using an ORM is like insisting on using assembly language instead of C# “because it’s faster”…

Way more nifty: Using compression at field level when persisting #2

In a previous post (here) I showed how the contents of a string property could be internally stored as a GZipped byte array for persistence. The code demonstrated how to achieve this, but not in a particularly pretty manner, because I did some nasty hacks inside the property’s setter and getter – i.e. I crammed all the zipping/unzipping stuff directly inside my domain class.

When using NHibernate for persistence you have another option, which I have recently learned about. So let’s rewrite the example from my previous post.
Read more

Using compression at field level when persisting

One day at work, someone needed to store a lot of text in a field of a domain object – and since we are using NHibernate for persistence, that field would get stored to the database, taking up a huge amount of space.

To circumvent this, we just made the actual type of the member variable description be a byte[], and then we let the accessor property Description zip and unzip when accessing the value.

Like so:

It should be noted that it is crucial that the ToArray() method of the compressed MemoryStream be called after the GZipStream stream has been disposed! That’s because the GZipStream will not write the gzip stream footer bytes until the stream is disposed. So if ToArray() is called before disposing, you will get an incomplete stream of bytes.

Moreover it should be noted that zipping strings in the database is not always cool because:

  1. Compression kicks in for strings at around 2-300 chars. The size of the compressed data is greater than that of the original string for shorter strings.
  2. Querying is impossible/hard/weird. πŸ™‚

But it is a nifty little trick to put in one’s backpack, and I had all sorts of trouble figuring out exactly how to make the GZipStream behave, so I figured it would be nice to put a working example on the net.