Free geek night about MongoDB

Goto ConferenceAt Goto Copenhagen this May, I gave a talk about MongoDB, which is a nifty document-oriented database that I find pretty interesting.

So, because I like to talk about MongoDB so much, I’ll give my talk again as a free Trifork geek night on Tuesday the 21th of June at the Trifork HQ in Aarhus (this time in Danish though).

If you’re a .NET person, possibly developing big enterprisey stuff and/or you’re interested in MongoDB or NoSQL in general, you should come to this one.

Tinkering with MongoDB and sharding

After my MongoDB presentation today, I was asked a few questions about MongoDB’s sharding capabilities. Like my interest so far, my talk was completely focused on the frictionless aspects of using MongoDB, so I have never tried any of the sharding and replica set configurations that MongoDB can run in.

That has got to end!

So, let’s try spinning up the simplest possible sharding scenario that we can think of: Two durable MongoDB instances on port 27017 and 27018 with one collection sharded across them:

MongoDB sharding requires that you spin up a special configuration server that stores the configuration of which shards are available – let’s spin this up on port 27019:

– and finally, we spin up one instance of mongos on port 27020, pointing it towards the configuration server:

To finalize the setup, we let the configuration database know of the two shards we have started by connecting to the admin database using the Mongo shell:

Now, let’s see if it understood this:

As you can see, the config database correctly stored information about our two shards – that was easy!

Finally, we need to enable sharding for one particular database and make sure that our collection of unicorns is sharded by the name field:

At this point, even though the fairytale database and unicorns collection did not exist, they have been created for me, and the required index has been created for the shard key. I can verify this like so:

Now, let’s go to C# with mongo-csharp and hammer 10 million randomly named unicorns, each carrying a payload of up to 8 KB of fairy dust in there:

now, let’s go to the Mongo shell and see if they’re there:

Great! 10 million unicorns in there. Let’s check out the disk and see if data was somehow distributed among the shards:

That seems to be pretty well balanced if you ask me. Let’s see what Mongo can tell us:

66 chunks on the 1st shard, and 65 chunks on the 2nd shard – it is in fact pretty well balanced.

Conclusion so far

It seems to be pretty easy to begin sharding data, which is perfectly in line with the usual MongoDB feeling. It’s definitely a subject I need to look more into though, so if you want to read more about it, I can really recommend Kristina Chodorow‘s blog: Snail In A Turtleneck.