In MongoDB, there’s no way to lock a database, collection, or document. The ability to work without locking is a requirement for any db that wishes to be horizontally scalable, and obviously this imposes some limitations and/or possibilities (depending on your point of view :)).
If you want all the goodness that document-orientation brings, it seems we need to cope with this non-locking database.
So how DO you update stuff in MongoDB? And, more importantly: how do you update stuff without race conditions?
In one of my previous posts on MongoDB, I mentioned that the unit of atomicity is a document – i.e., either a document gets saved/updated/deleted or it doesn’t. That must mean that we can count on updating one document only (or not), so we should build our applications so they can work without requiring multiple documents to be updated to be consistent ([1. Which is good practice anyway! In my experience, long and wide db transactions are often used, not to enforce a strict consistency as much as to allow scenarios like: “when this happens, this should also happen”. But that kind of logic can often be handled by something else, e.g. by a publishing events reliably to other processes (logically and/or physically), that handles the side-effects.]).
First, let’s take a look at how to actually update a document.
Naïve attempt to update a document
Well, we could do this:
|
> use myblog switched to db myblog > var doc = {'headline': 'Just checking', 'tags': ['nifty']} > db.posts.save(doc) > // omgwtfbbq, we forgot to tag with 'test' as well... let's correct it: > doc.tags[1] = 'test' test > // let's go get a cup of coffee... > // .... > // - aand now we're back - let's hit save > db.posts.save(doc) > db.posts.find() { "_id" : ObjectId("4b965229bf4a0000000043bc"), "headline" : "Just checking", "tags" : [ "nifty", "test" ] } |
That is, if you go and save a document that already has an ID, any existing document with that ID will be updated.
This would work if we were the only client on the db. But what if someone was editing the post in that same moment, adding another tag as well? Well, if he was unfortunate enough to save his edits when we were out for coffee, his changes would be lost.
One way to actually do it
By using the
update function!
update accepts the following four arguments:
- criteria – document selector that specifies which document to be updated
- objNew – document to save
- upsert – bool to specify auto-insert if document does not exist (“update if present, insert if missing”)
- multi – bool to allow updating multiple documents that match the criteria (default is only first document)
Actually, as you can now probably see,
save(doc) is just a shorthand for
update({}, doc, true, false) – an upsert with the document we’re saving.
This way, we could easily add an incrementing
version field to our documents to make sure that the version we’re saving is the version we retrieved.
Let’s try it out:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
|
> post = {'headline': 'has a version field', 'tags': ['nifty'], 'version': 1} { "headline" : "has a version field", "tags" : [ "nifty" ], "version" : 1 } > db.posts.save(post) > // now we're editing it > post['tags'][1] = 'test' test > post['version']++ 1 > post { "headline" : "has a version field", "tags" : [ "nifty", "test" ], "version" : 2, "_id" : ObjectId("4b9884d4b54e000000006c69") } > // now someone else retrieves the post > someoneElsesPost = db.posts.findOne() { "_id" : ObjectId("4b9884d4b54e000000006c69"), "headline" : "has a version field", "tags" : [ "nifty" ], "version" : 1 } > // and we save it, setting criteria to the version we retrieved > db.posts.update({'version': 1}, post); db.$cmd.findOne({getlasterror: 1}) { "err" : null, "updatedExisting" : true, "n" : 1, "ok" : 1 } > // as you can probably tell, n:1 means that 1 document was updated... > // > // now that other guy makes an edit and tries to save it > someoneElsesPost['tags'][1] = 'json' json > db.posts.update({'version': 1}, someoneElsesPost); db.$cmd.findOne({getlasterror: 1}) { "err" : null, "updatedExisting" : false, "n" : 0, "ok" : 1 } > // 0 documents were updated! Good! |
– good thing he didn’t get through with that one! 🙂
As you can see, we can easily implement optimistic concurrency on each document by constraining updates to the version we checked out. But, as I will show next, you can actually do a lot of things on the server.
Server-side document updates
Instead of retrieving an entire document, modifying it, and saving it back again with the risk of overwriting someone else’s edits, we can ask the server to make edits as smaller operations. E.g. our attempt to add a missing ‘test’ tag to the document could have been done like this:
|
> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69")}, { $push: { 'tags': 'test' } }); > db.posts.find() { "_id" : ObjectId("4b9884d4b54e000000006c69"), "headline" : "has a version field", "version" : 2, "tags" : [ "nifty", "json", "test" ] } |
See how the code
$push modifier was used to push a value into the array… this stuff is great. But here, we have a race condition again – what if someone added the ‘test’ tag almost the same time as we did? Then two ‘test’ tags would be present in the array.
One way is to constrain the update by id and the absence of ‘test’ in the tags array:
|
> db.posts.update({ '_id': ObjectId("4b9884d4b54e000000006c69"), $neq { 'tags': 'test'} }, { $push: { 'tags': 'test' } }); |
– another is to use the
$addToSet function, which makes MongoDB treat the array as a set:
|
> db.posts.update({'_id': ObjectId("4b9884d4b54e000000006c69") }, { $addToSet: { 'tags': 'test' } }); |
Nifty!
My conclusion (so far) is that an application can get a huge benefit from using the various modifier operations – performance-wise (obviously), but probably also UX-wise as well… It’s a step in another direction from the usual CRUD scenarios that I usually compulsively associate with the word “update”, and I imagine it could be made to reflect the user’s interactions with the system.
I am thinking that the majority of the user’s interactions with the system could (and probably should) be put on the form
|
db.collection.update({ [assumed state before] }, { [modifier operations to "migrate" one or more documents to the new state] }, upsert, multi); db.$cmd.findOne({getlasterror: 1}) |
– and then one could implement a more-or-less generic mechanism on the client side to handle unsuccessful updates (e.g. by asking the user what to do then, reloading some data to allow [assumed state before] to be something else, or by handling certain situations with some kind of merging function, etc.).
How to perform updates across multiple documents
My first thought is that this situation should be avoided when working with a document-oriented db. I think most people will agree with this one.
I am pretty unsure of this, actually… The rest of this post is just a few thoughts on my first take on this, should I need to do this. Comments are greatly appreciated!
The problem in updating multiple documents is that we can perform an update on one document at a time, each time checking if the update went well or not. But there’s no way to (consistently) roll back update #1 if update #2 fails. So this means that there’s only one way: Forward! But how to proceed then, when an update failed?
How do we usually do stuff reliably across boundaries of multiple things that may or may not succeed, allowing us to handle errors as gracefully as possible and proceed thereafter?
I’m thinking that asynchronous reliable one-way messaging is the answer to this.
So if an application ever needs to update multiple documents in the most reliable way possible, it should probably perform one document update per “transaction” – in NServiceBus terminology that would be updating one document per message handler. And then the handler should throw an exception if an update unexpectedly fails.
But again: I’m thinking that this situation should be avoided at all costs with a document-oriented db. If ACID is required, the application should probably have a RDBMS on the side, or implement some kind of transactional mechanism in the document store.
Conclusion
That concludes my little learning series of MongoDB posts.
I must say that I am intrigued by all the NoSQL discussions currently going on in the communities, and I think it is always a sign of health that we question the technologies we use.
I am entirely convinced that document dbs could and should have been used for some parts of systems that I have experienced, and I am blown away by the lack of friction when starting up a project on top of a schemaless db.
As a .NET dude, I am convinced that the future will see more .NET systems built with more than one db underneath – e.g. with MongoDB for all the “soft parts” of the system, NHibernate on SQL Server for the few things that by nature require ACID, and then some NHibernate Search/Lucene/Solr.NET for full-text indexing and searching capabilities etc.