nhibernate search – mookid on code

Packt Publishing has before asked if I was interested in reviewing some of their books, which I was – this time, however, I asked them if they were interested in me reviewing their upcoming NHibernate 3.0 Cookbook by Jason Dentler.

I did that, because I like NHibernate very much, and I would like to help promote good fresh litterature about the subject. And this book really stands out as fresh, because it covers NHibernate 3.0 which has not even been officially released yet!

My first impressions are good – it starts out with creating a model and the usual XML-mapping stuff, and then it dives directly into modeling an inheritance hierarchy. I think this is pretty cool, because it is a sign that the book has a fairly high level of ambition: It is not just about stuffing away rows in the db, it’s about persisting an actual model!

It covers Fluent NHibernate and Fabio’s ConfORM as well, so it provides a really good foundation to anyone interested in learning the intricacies of configuring NHibernate. And it is pretty true to the model-first approach, which is how I like it.

Then it goes on with a chapter on how to manage sessions and transactions including – among other things – an example on how to manage the session from an ASP.NET MVC action filter (which is not “best practice” from an ASP.NET MVC perspective IMO, as it relies on static gateways, but I digress… the book is not about ASP.NET MVC :))

The query chapter is great, because it covers everything I can think of: Criteria, QueryOver, HQL, both in their normal and multi forms, futures, LINQ to NHibernate, detached criteria, and the new HQL bulk operations, insert and update. If I must put my finger on something, I think that the different areas are covered a little too lightly, but hey – there’s plenty of information on this stuff on the internet, and you could probably write an entire book entirely about how to put HQL to use.

The testing chapter is great as well, as it touches on nice-to-know stuff and some of “the new developments” in the area: NHibernate Profiler, Fluent NHibernate automatic persistence testing, using in-memory SQLite for persistence testing, + more.

The chapter on implementing a data access layer shows a typical data access object and a repository implementation which will probably look familiar to a lot of people, implemented with NHibernate. They both have the ability to automatically perform their operations withing transactions, if one is not already active. This makes the implementations pretty flexible, as they can be used either “by themselves”, or they can implicitly enlist in an ongoing unit of work. Moreover, a pretty nifty named query implementation is shown, complete with automated test that checks whether all implemented named query classes have corresponding named queries in an HBM XML file.

The rest of the book shows how various common tasks can be achieved using NHibernate or some of the many NHContrib projects, like e.g. creating an audit trail by listening to events, creating an IUserType to encrypt strings, using Burrows to manage session, putting NHibernate Search to use, etc… As you can probably imagine, this stuff is covered pretty lightly, but it’s sufficient to give an impression on the huge ecosystem that surrounds NHibernate, which is great.

Conclusion

It strikes me that the book is definitely a “no BS-book” – there’s plenty of code, which is mostly high quality and sufficiently best practice-compliant, and recommendations throughout when there are decisions to be made. If I should criticize something, I think the sheer amount of code makes for an exhausting casual read 🙂 it does, however, claim to be a “cookbook”, so I guess that’s just the way it is.

The book is probably great for developers, who are either new to or semi-experienced in using NHibernate, but have a general high level of experience and skills.

All-in-all a good read, and it’s great that it touches on so many things in and around this huge framework!

Title: NHibernate 3.0 Cookbook
Author: Jason Dentler
ISBN 10/13: 184951304X / 978-1-84951-304-3
Publisher: Packt Publishing

In MongoDB, there’s no way to lock a database, collection, or document. The ability to work without locking is a requirement for any db that wishes to be horizontally scalable, and obviously this imposes some limitations and/or possibilities (depending on your point of view :)).

If you want all the goodness that document-orientation brings, it seems we need to cope with this non-locking database.

So how DO you update stuff in MongoDB? And, more importantly: how do you update stuff without race conditions?

In one of my previous posts on MongoDB, I mentioned that the unit of atomicity is a document – i.e., either a document gets saved/updated/deleted or it doesn’t. That must mean that we can count on updating one document only (or not), so we should build our applications so they can work without requiring multiple documents to be updated to be consistent ([1. Which is good practice anyway! In my experience, long and wide db transactions are often used, not to enforce a strict consistency as much as to allow scenarios like: “when this happens, this should also happen”. But that kind of logic can often be handled by something else, e.g. by a publishing events reliably to other processes (logically and/or physically), that handles the side-effects.]).

First, let’s take a look at how to actually update a document.

Naïve attempt to update a document

Well, we could do this:

> use myblog
switched to db myblog
> var doc = {'headline': 'Just checking', 'tags': ['nifty']}
> db.posts.save(doc)
> // omgwtfbbq, we forgot to tag with 'test' as well... let's correct it:
> doc.tags[1] = 'test'
test
> // let's go get a cup of coffee...
> // ....
> // - aand now we're back - let's hit save
> db.posts.save(doc)
> db.posts.find()
{ "_id" : ObjectId("4b965229bf4a0000000043bc"), "headline" : "Just checking", "tags" : [ "nifty", "test" ] }

> use myblog

switched to db myblog

> var doc = {'headline': 'Just checking', 'tags': ['nifty']}

> db.posts.save(doc)

> // omgwtfbbq, we forgot to tag with 'test' as well... let's correct it:

> doc.tags[1] = 'test'

test

> // let's go get a cup of coffee...

> // ....

> // - aand now we're back - let's hit save

> db.posts.save(doc)

> db.posts.find()

{ "_id" : ObjectId("4b965229bf4a0000000043bc"), "headline" : "Just checking", "tags" : [ "nifty", "test" ] }

That is, if you go and save a document that already has an ID, any existing document with that ID will be updated.

This would work if we were the only client on the db. But what if someone was editing the post in that same moment, adding another tag as well? Well, if he was unfortunate enough to save his edits when we were out for coffee, his changes would be lost.

One way to actually do it

By using the update function!

update accepts the following four arguments:

criteria – document selector that specifies which document to be updated
objNew – document to save
upsert – bool to specify auto-insert if document does not exist (“update if present, insert if missing”)
multi – bool to allow updating multiple documents that match the criteria (default is only first document)

Actually, as you can now probably see, save(doc) is just a shorthand for update({}, doc, true, false) – an upsert with the document we’re saving.

This way, we could easily add an incrementing version field to our documents to make sure that the version we’re saving is the version we retrieved.

Let’s try it out:

> post = {'headline': 'has a version field', 'tags': ['nifty'], 'version': 1}
{
        "headline" : "has a version field",
        "tags" : [
                "nifty"
        ],
        "version" : 1
}
> db.posts.save(post)
> // now we're editing it
> post['tags'][1] = 'test'
test
> post['version']++
1
> post
{
        "headline" : "has a version field",
        "tags" : [
                "nifty",
                "test"
        ],
        "version" : 2,
        "_id" : ObjectId("4b9884d4b54e000000006c69")
}
> // now someone else retrieves the post
> someoneElsesPost = db.posts.findOne()
{
        "_id" : ObjectId("4b9884d4b54e000000006c69"),
        "headline" : "has a version field",
        "tags" : [
                "nifty"
        ],
        "version" : 1
}
> // and we save it, setting criteria to the version we retrieved
> db.posts.update({'version': 1}, post); db.$cmd.findOne({getlasterror: 1})
{ "err" : null, "updatedExisting" : true, "n" : 1, "ok" : 1 }
> // as you can probably tell, n:1 means that 1 document was updated...
> //
> // now that other guy makes an edit and tries to save it
> someoneElsesPost['tags'][1] = 'json'
json
> db.posts.update({'version': 1}, someoneElsesPost); db.$cmd.findOne({getlasterror: 1})
{ "err" : null, "updatedExisting" : false, "n" : 0, "ok" : 1 }
> // 0 documents were updated! Good!

> post = {'headline': 'has a version field', 'tags': ['nifty'], 'version': 1}

{

"headline" : "has a version field",

"tags" : [

"nifty"

"version" : 1

}

> db.posts.save(post)

> // now we're editing it

> post['tags'][1] = 'test'

test

> post['version']++

> post

{

"headline" : "has a version field",

"tags" : [

"nifty",

"test"

"version" : 2,

"_id" : ObjectId("4b9884d4b54e000000006c69")

}

> // now someone else retrieves the post

> someoneElsesPost = db.posts.findOne()

{

"_id" : ObjectId("4b9884d4b54e000000006c69"),

"headline" : "has a version field",

"tags" : [

"nifty"

"version" : 1

}

> // and we save it, setting criteria to the version we retrieved

> db.posts.update({'version': 1}, post); db.$cmd.findOne({getlasterror: 1})

{ "err" : null, "updatedExisting" : true, "n" : 1, "ok" : 1 }

> // as you can probably tell, n:1 means that 1 document was updated...

> //

> // now that other guy makes an edit and tries to save it

> someoneElsesPost['tags'][1] = 'json'

json

> db.posts.update({'version': 1}, someoneElsesPost); db.$cmd.findOne({getlasterror: 1})

{ "err" : null, "updatedExisting" : false, "n" : 0, "ok" : 1 }

> // 0 documents were updated! Good!

– good thing he didn’t get through with that one! 🙂

As you can see, we can easily implement optimistic concurrency on each document by constraining updates to the version we checked out. But, as I will show next, you can actually do a lot of things on the server.

Server-side document updates

Instead of retrieving an entire document, modifying it, and saving it back again with the risk of overwriting someone else’s edits, we can ask the server to make edits as smaller operations. E.g. our attempt to add a missing ‘test’ tag to the document could have been done like this:

> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69")}, { $push: { 'tags': 'test' } });
> db.posts.find()
{
        "_id" : ObjectId("4b9884d4b54e000000006c69"),
        "headline" : "has a version field",
        "version" : 2,
        "tags" : [ "nifty", "json", "test" ]
}

> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69")}, { $push: { 'tags': 'test' } });

> db.posts.find()

{

"_id" : ObjectId("4b9884d4b54e000000006c69"),

"headline" : "has a version field",

"version" : 2,

"tags" : [ "nifty", "json", "test" ]

}

See how the code $push modifier was used to push a value into the array… this stuff is great. But here, we have a race condition again – what if someone added the ‘test’ tag almost the same time as we did? Then two ‘test’ tags would be present in the array.

One way is to constrain the update by id and the absence of ‘test’ in the tags array:

> db.posts.update({
    '_id': ObjectId("4b9884d4b54e000000006c69"), 
    $neq { 'tags': 'test'} 
}, { 
    $push: { 'tags': 'test' } 
});

> db.posts.update({

'_id': ObjectId("4b9884d4b54e000000006c69"),

$neq { 'tags': 'test'}

}, {

$push: { 'tags': 'test' }

});

– another is to use the $addToSet function, which makes MongoDB treat the array as a set:

> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69") }, { $addToSet: { 'tags': 'test' } });

1	> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69") }, { $addToSet: { 'tags': 'test' } });

Nifty!

My conclusion (so far) is that an application can get a huge benefit from using the various modifier operations – performance-wise (obviously), but probably also UX-wise as well… It’s a step in another direction from the usual CRUD scenarios that I usually compulsively associate with the word “update”, and I imagine it could be made to reflect the user’s interactions with the system.

I am thinking that the majority of the user’s interactions with the system could (and probably should) be put on the form

db.collection.update({
        [assumed state before]
}, {
        [modifier operations to "migrate" one or
         more documents to the new state]
}, upsert, multi); db.$cmd.findOne({getlasterror: 1})

db.collection.update({

[assumed state before]

}, {

[modifier operations to "migrate" one or

How to perform updates across multiple documents

My first thought is that this situation should be avoided when working with a document-oriented db. I think most people will agree with this one.

I am pretty unsure of this, actually… The rest of this post is just a few thoughts on my first take on this, should I need to do this. Comments are greatly appreciated!

The problem in updating multiple documents is that we can perform an update on one document at a time, each time checking if the update went well or not. But there’s no way to (consistently) roll back update #1 if update #2 fails. So this means that there’s only one way: Forward! But how to proceed then, when an update failed?

How do we usually do stuff reliably across boundaries of multiple things that may or may not succeed, allowing us to handle errors as gracefully as possible and proceed thereafter?

I’m thinking that asynchronous reliable one-way messaging is the answer to this.

So if an application ever needs to update multiple documents in the most reliable way possible, it should probably perform one document update per “transaction” – in NServiceBus terminology that would be updating one document per message handler. And then the handler should throw an exception if an update unexpectedly fails.

But again: I’m thinking that this situation should be avoided at all costs with a document-oriented db. If ACID is required, the application should probably have a RDBMS on the side, or implement some kind of transactional mechanism in the document store.

Conclusion

That concludes my little learning series of MongoDB posts.

I must say that I am intrigued by all the NoSQL discussions currently going on in the communities, and I think it is always a sign of health that we question the technologies we use.

I am entirely convinced that document dbs could and should have been used for some parts of systems that I have experienced, and I am blown away by the lack of friction when starting up a project on top of a schemaless db.

As a .NET dude, I am convinced that the future will see more .NET systems built with more than one db underneath – e.g. with MongoDB for all the “soft parts” of the system, NHibernate on SQL Server for the few things that by nature require ACID, and then some NHibernate Search/Lucene/Solr.NET for full-text indexing and searching capabilities etc.

Category: nhibernate search

Book review: NHibernate 3.0 Cookbook

Conclusion

More checking out MongoDB: Updating

Naïve attempt to update a document

One way to actually do it

Server-side document updates

How to perform updates across multiple documents

Conclusion