Tinkering with MongoDB and sharding

After my MongoDB presentation today, I was asked a few questions about MongoDB’s sharding capabilities. Like my interest so far, my talk was completely focused on the frictionless aspects of using MongoDB, so I have never tried any of the sharding and replica set configurations that MongoDB can run in.

That has got to end!

So, let’s try spinning up the simplest possible sharding scenario that we can think of: Two durable MongoDB instances on port 27017 and 27018 with one collection sharded across them:

mkdir c:\data\db1
mongod --shardsvr --port 27017 --journal --dbpath c:\data\db1

1 2	mkdir c:\data\db1 mongod --shardsvr --port 27017 --journal --dbpath c:\data\db1

mkdir c:\data\db2
mongod --shardsvr --port 27018 --journal --dbpath c:\data\db2

1 2	mkdir c:\data\db2 mongod --shardsvr --port 27018 --journal --dbpath c:\data\db2

MongoDB sharding requires that you spin up a special configuration server that stores the configuration of which shards are available – let’s spin this up on port 27019:

mkdir c:\data\configdb
mongod --configsvr --port 27019 --dbpath c:\data\configdb

1 2	mkdir c:\data\configdb mongod --configsvr --port 27019 --dbpath c:\data\configdb

– and finally, we spin up one instance of mongos on port 27020, pointing it towards the configuration server:

mongos --configdb localhost:27019 --port 27020

1	mongos --configdb localhost:27019 --port 27020

To finalize the setup, we let the configuration database know of the two shards we have started by connecting to the admin database using the Mongo shell:

C:\mongodb\bin>mongo localhost:27020
MongoDB shell version: 1.8.1
connecting to: localhost:27020/test
> use admin
switched to db admin
> db.runCommand({addShard:"localhost:27017"})
> db.runCommand({addShard:"localhost:27018"})

C:\mongodb\bin>mongo localhost:27020

MongoDB shell version: 1.8.1

connecting to: localhost:27020/test

> use admin

switched to db admin

> db.runCommand({addShard:"localhost:27017"})

> db.runCommand({addShard:"localhost:27018"})

Now, let’s see if it understood this:

> db.runCommand({listShards: 1})
{
        "shards" : [
                {
                        "_id" : "shard0000",
                        "host" : "localhost:27017"
                },
                {
                        "_id" : "shard0001",
                        "host" : "localhost:27018"
                }
        ],
        "ok" : 1
}

> db.runCommand({listShards: 1})

{

"shards" : [

{

"_id" : "shard0000",

"host" : "localhost:27017"

{

"_id" : "shard0001",

"host" : "localhost:27018"

}

"ok" : 1

}

As you can see, the config database correctly stored information about our two shards – that was easy!

Finally, we need to enable sharding for one particular database and make sure that our collection of unicorns is sharded by the name field:

> db.runCommand({enableSharding: "fairytale"})
{ "ok" : 1 }
> db.runCommand({shardCollection: "fairytale.unicorns", key: {name: 1}})
{ "collectionsharded" : "fairytale.unicorns", "ok" : 1 }

> db.runCommand({enableSharding: "fairytale"})

{ "ok" : 1 }

> db.runCommand({shardCollection: "fairytale.unicorns", key: {name: 1}})

{ "collectionsharded" : "fairytale.unicorns", "ok" : 1 }

At this point, even though the fairytale database and unicorns collection did not exist, they have been created for me, and the required index has been created for the shard key. I can verify this like so:

> use fairytale
> db.unicorns.getIndexes()
[
        {
                "name" : "_id_",
                "ns" : "fairytale.unicorns",
                "key" : {
                        "_id" : 1
                },
                "v" : 0
        },
        {
                "ns" : "fairytale.unicorns",
                "key" : {
                        "name" : 1
                },
                "name" : "name_1",
                "v" : 0
        }
]

> use fairytale

> db.unicorns.getIndexes()

[

{

"name" : "_id_",

"ns" : "fairytale.unicorns",

"key" : {

"_id" : 1

"v" : 0

{

"ns" : "fairytale.unicorns",

"key" : {

"name" : 1

"name" : "name_1",

"v" : 0

}

]

Now, let’s go to C# with mongo-csharp and hammer 10 million randomly named unicorns, each carrying a payload of up to 8 KB of fairy dust in there:

class Program
{
    static readonly Random Random = new Random();

    static void Main()
    {
        var server = MongoServer.Create("mongodb://localhost:27020");
        var database = server.GetDatabase("fairytale");
        var unicorns = database.GetCollection("unicorns");

        using (database.RequestStart())
        {
            var batch = Enumerable.Range(0, 10000000)
                .Select(_ =>
                        new
                            {
                                name = GenerateRandomUnicornName(),
                                horns = 1,
                                likes = new[] {"eggs", "beer"},
                                fairy_dust = GenerateRandomBytes(Random.Next(8192))
                            });
                
            unicorns.InsertBatch(batch);

            server.GetLastError();
        }
            
        server.Disconnect();
    }

    static byte[] GenerateRandomBytes(int length)
    {
        var bytes = new byte[length];
        Random.NextBytes(bytes);
        return bytes;
    }

    static string GenerateRandomUnicornName()
    {
        Func<char> charFactory = () => (char)('a' + Random.Next(25));
        var funcs = Enumerable.Repeat(charFactory, 10 + Random.Next(30));
        return new string(funcs.Select(f => f()).ToArray());
    }
}

class Program

{

static readonly Random Random = new Random();

static void Main()

{

var server = MongoServer.Create("mongodb://localhost:27020");

var database = server.GetDatabase("fairytale");

var unicorns = database.GetCollection("unicorns");

using (database.RequestStart())

{

var batch = Enumerable.Range(0, 10000000)

.Select(_ =>

new

{

name = GenerateRandomUnicornName(),

horns = 1,

likes = new[] {"eggs", "beer"},

fairy_dust = GenerateRandomBytes(Random.Next(8192))

});

unicorns.InsertBatch(batch);

server.GetLastError();

}

server.Disconnect();

}

static byte[] GenerateRandomBytes(int length)

{

var bytes = new byte[length];

Random.NextBytes(bytes);

return bytes;

}

static string GenerateRandomUnicornName()

{

Func<char> charFactory = () => (char)('a' + Random.Next(25));

var funcs = Enumerable.Repeat(charFactory, 10 + Random.Next(30));

return new string(funcs.Select(f => f()).ToArray());

}

now, let’s go to the Mongo shell and see if they’re there:

> db.unicorns.count()
10000000

1 2	> db.unicorns.count() 10000000

Great! 10 million unicorns in there. Let’s check out the disk and see if data was somehow distributed among the shards:

That seems to be pretty well balanced if you ask me. Let’s see what Mongo can tell us:

> db.printShardingStatus()
--- Sharding Status ---
  sharding version: { "_id" : 1, "version" : 3 }
  shards:
      { "_id" : "shard0000", "host" : "localhost:27017" }
      { "_id" : "shard0001", "host" : "localhost:27018" }
  databases:
        { "_id" : "admin", "partitioned" : false, "primary" : "config" }
        { "_id" : "fairytale", "partitioned" : true, "primary" : "shard0000" }
                fairytale.unicorns chunks:
                                shard0001       66
                                shard0000       65
                        too many chunksn to print, use verbose if you want to force print

> db.printShardingStatus()

--- Sharding Status ---

sharding version: { "_id" : 1, "version" : 3 }

shards:

{ "_id" : "shard0000", "host" : "localhost:27017" }

{ "_id" : "shard0001", "host" : "localhost:27018" }

databases:

{ "_id" : "admin", "partitioned" : false, "primary" : "config" }

{ "_id" : "fairytale", "partitioned" : true, "primary" : "shard0000" }

fairytale.unicorns chunks:

shard0001 66

shard0000 65

too many chunksn to print, use verbose if you want to force print

66 chunks on the 1st shard, and 65 chunks on the 2nd shard – it is in fact pretty well balanced.

Conclusion so far

It seems to be pretty easy to begin sharding data, which is perfectly in line with the usual MongoDB feeling. It’s definitely a subject I need to look more into though, so if you want to read more about it, I can really recommend Kristina Chodorow‘s blog: Snail In A Turtleneck.

5 thoughts on “Tinkering with MongoDB and sharding”

Jesper Lund Stocholm says:

2011-05-16 at 14:27

Very interesting 🙂

Any specific reason for why you have not used Raven DB?

1. mookid says:
  
  2011-05-16 at 14:55
  
  Yes – I like MongoDB more 😉
  
  I just happened to start working with MongoDB some time ago when RavenDB was not available, and has caused no friction whatsoever that MongoDB is not based on .NET.
  
  Moreover, MongoDB is much more mature than Raven – #justsaying (not that it influences any of my decisions…)
  
  I am planning on giving RavenDB another spin though, because I don’t know too much about it yet.
  
Vikas Kumar says:

2011-10-05 at 23:53

Excellent Post!!! I just ran step by step, and it works like a charm. 10 million records and I did a search on one particular record, and the speed of MongoDB is just amazing.

Thanks,
Vikas

1. mookid says:
  
  2011-10-05 at 23:59
  
  Awesome 🙂
  
Srinivasan says:

2012-04-11 at 16:35

Thanks mate

Tinkering with MongoDB and sharding

Conclusion so far

Like this:

Related

5 thoughts on “Tinkering with MongoDB and sharding”

Leave a Reply Cancel reply

Conclusion so far

Share this:

Like this:

Related

5 thoughts on “Tinkering with MongoDB and sharding”

Leave a Reply Cancel reply