Tinkering with MongoDB and sharding

After my MongoDB presentation today, I was asked a few questions about MongoDB’s sharding capabilities. Like my interest so far, my talk was completely focused on the frictionless aspects of using MongoDB, so I have never tried any of the sharding and replica set configurations that MongoDB can run in.

That has got to end!

So, let’s try spinning up the simplest possible sharding scenario that we can think of: Two durable MongoDB instances on port 27017 and 27018 with one collection sharded across them:

MongoDB sharding requires that you spin up a special configuration server that stores the configuration of which shards are available – let’s spin this up on port 27019:

– and finally, we spin up one instance of mongos on port 27020, pointing it towards the configuration server:

To finalize the setup, we let the configuration database know of the two shards we have started by connecting to the admin database using the Mongo shell:

Now, let’s see if it understood this:

As you can see, the config database correctly stored information about our two shards – that was easy!

Finally, we need to enable sharding for one particular database and make sure that our collection of unicorns is sharded by the name field:

At this point, even though the fairytale database and unicorns collection did not exist, they have been created for me, and the required index has been created for the shard key. I can verify this like so:

Now, let’s go to C# with mongo-csharp and hammer 10 million randomly named unicorns, each carrying a payload of up to 8 KB of fairy dust in there:

now, let’s go to the Mongo shell and see if they’re there:

Great! 10 million unicorns in there. Let’s check out the disk and see if data was somehow distributed among the shards:

That seems to be pretty well balanced if you ask me. Let’s see what Mongo can tell us:

66 chunks on the 1st shard, and 65 chunks on the 2nd shard – it is in fact pretty well balanced.

Conclusion so far

It seems to be pretty easy to begin sharding data, which is perfectly in line with the usual MongoDB feeling. It’s definitely a subject I need to look more into though, so if you want to read more about it, I can really recommend Kristina Chodorow‘s blog: Snail In A Turtleneck.

5 thoughts on “Tinkering with MongoDB and sharding

    1. Yes – I like MongoDB more πŸ˜‰

      I just happened to start working with MongoDB some time ago when RavenDB was not available, and has caused no friction whatsoever that MongoDB is not based on .NET.

      Moreover, MongoDB is much more mature than Raven – #justsaying (not that it influences any of my decisions…)

      I am planning on giving RavenDB another spin though, because I don’t know too much about it yet.

  1. Excellent Post!!! I just ran step by step, and it works like a charm. 10 million records and I did a search on one particular record, and the speed of MongoDB is just amazing.

    Thanks,
    Vikas

Leave a Reply to Vikas Kumar Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.