In my first post about MongoDB, I touched querying very lightly. Querying is of course pretty important to most systems, so it’s fair to dedicate a separate post to the subject.
Querying in MongoDB works by sending a document to the server, e.g. in the following snippet I create a document with a post ID
1 |
var somePost = db.posts.findOne({"_id": ObjectId("00112233445566778899aabb")}) |
– which can actually be even shorter, as the find and findOne functions accept an ObjectId directly as their argument, like so:
1 |
var somePost = db.posts.findOne(ObjectId("00112233445566778899aabb")) |
But how can I find a post with a specific slug? Easy, like so:
1 |
var somePost = db.posts.find({slug: "this-slug-probably-comes-from-a-url"}) |
But how does this perform? It’s easy to examine how queries are executed with the explain() function, like so:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
> db.posts.find({slug: "this-slug-probably-comes-from-a-url"}).explain() { "cursor" : "BasicCursor", "startKey" : { }, "endKey" : { }, "nscanned" : 10000, "n" : 1, "millis" : 11, "allPlans" : [ { "cursor" : "BasicCursor", "startKey" : { }, "endKey" : { } } ] } |
– yielding some info about the execution of the query. I don’t know exactly how to interpret all this, but I think I get that "nscanned": 10000 means 10000 documents were scanned – and in a collection with 10000 documents, that’s not entirely optimal as it implies a full table scan. Now, let’s make sure that our query will execute as fast as possible by creating an index on the field ( _id is always automatically indexed):
1 |
db.posts.ensureIndex({slug:1}) |
Now lets explain() again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
> db.posts.find({slug: "post-no-454"}).explain() { "cursor" : "BtreeCursor slug_1", "startKey" : { "slug" : "post-no-454" }, "endKey" : { "slug" : "post-no-454" }, "nscanned" : 1, "n" : 1, "millis" : 0, "allPlans" : [ { "cursor" : "BtreeCursor slug_1", "startKey" : { "slug" : "post-no-454" }, "endKey" : { "slug" : "post-no-454" } } ] } |
Wow! That’s what I call an improvement!
What about posts with a specific tag? First I tried the following snippet, because I learned that the special $where field could be put in a query document to evaluate a predicate server-side:
1 |
var niftyPosts = db.posts.find({$where: function() { return this.tags != null && this.tags.indexOf("nifty") != -1; }}) |
– and this actually works. This syntax is sort of clunky though. Luckily, MongoDB provides a nifty mechanism for arrays that automagically checks if something is contained in it. So my query can be rewritten to this:
1 |
var niftyPosts = db.posts.find({tags: 'nifty'}) |
Nifty!
Now, to take advantage of indexes, the special query document fields should be used. I showed $where above, but there are more – to name a few: $gt (greater than), $gte (greater than or equal), $lt (less than), $lte (less than or equal), $ne (not equal), and many more. For example, to count the number of non-nifty posts in February 2010:
1 2 3 4 5 |
> db.posts.find({ date: {$gte: 20100201, $lt: 20100301}, tags: {$ne: 'nifty'} }).count() 0 |
Check out the manual for some great documentation on available operators.
Conclusion
Querying with MongoDB actually seems pretty cool and flexible. I like the idea that it’s possible to execute ad-hoc queries, and for most usage I think the supplied operators are adequate. The ability to supply a predicate function via $where seems really cool, but it should probably only be used in conjunction with one or more of the other operators to avoid a full table scan.