Category Archives: nosql

Why do all NoSQL discussions end up discussing CAP?

Having watched an equally interesting and entertaining panel discussion, “The Aarhus 6″, at GOTO Aarhus, it strikes me – again! – as weird and funny that NoSQL discussions almost always seem to end up discussing performance and availability.

In my experience – and Martin Fowler actually noted this towards the end – most development teams and organizations don’t care about the availability of their database. Not as in “don’t care at all”, but as in “availability demands are so that an ordinary OPS team can make the database sufficiently available”.

In other words, almost no organizations need five nines, four nines, or something like that. Most organizations, if they’re honest and don’t pretend to be more important than they are, probably don’t even need two nines!

Thinking about this, it’s funny that the discussions almost never discuss how easy the databases are to get started with, and how easy it is to store data in them. Matt Dennis even managed to talk almost for his entire 50 minutes in his “Big Data OLTP with Apache Cassandra” without touching on how data is actually stored in Cassandra. It’s only because someone in the audience asked about it, that he said the word “column family”.

Chris Anderson did at some point however, comment that “the winner” (i.e. the NoSQL database that development teams will end up choosing), might be selected “because it’s easy to install”.

This is actually spot on, in my opinion! I think most development teams are better off prioritizing ease of installation and operation, and ease of usage, far far over operational quality attributes like insane scalability and availability.

Fun with RavenDB 3

This is the third post in a small series about RavenDB – a document database in .NET. I will try to touch the same areas as I did in my series on MongoDB, possibly comparing the two where I see fit.

Now, this time we will take a look at some more querying…

Introduction

After having chewed a bit on the concept of map/reduce queries from doing my MongoDB series, I am actually beginning to see the beauty of this kind of query – and of course RavenDB supports them as well, because it is currently the only sane way to structure aggregate queries in distributed databases.

All of RavenDB’s queries are actually map/reduce queries, but if you don’t supply a reduction function, the reduction is trivial, as every document is emitted in its entirety.

Now, let’s try specifying our own reduction function….

Simple map/reduce query

But first we’ll start out by stuffing some data in the DB:

using (var session = documentStore.OpenSession())
{
    session.Store(new Order
                        {
                            Items =
                                {
                                    new Item {Name = "beer", Amount = 12},
                                    new Item {Name = "peanuts", Amount = 3},
                                    new Item {Name = "cashew nuts", Amount = 4},
                                }
                        });
 
    session.Store(new Order
                        {
                            Items =
                                {
                                    new Item {Name = "beer", Amount = 6},
                                    new Item {Name = "just nuts", Amount = 8},
                                    new Item {Name = "peanuts", Amount = 6},
                                    new Item {Name = "cashew nuts", Amount = 2},
                                }
                        });
 
    session.Store(new Order
                        {
                            Items =
                                {
                                    new Item {Name = "beer", Amount = 5},
                                }
                        });
 
    session.SaveChanges();
}

Now, in order to aggregate each Item by name, summing up the amounts, let’s construct the following index:

 
public class AggregateAmountsPerItem : AbstractIndexCreationTask
{
    public override IndexDefinition CreateIndexDefinition()
    {
        return new IndexDefinition<Order, ItemAggregate>
                    {
                        Map = orders => from order in orders
                                        from item in order.Items
                                        select new {item.Name, item.Amount},
                        Reduce = items => from item in items
                                            group item by item.Name into i
                                            select new {Name = i.Key, Amount = i.Sum(x => x.Amount)}
                    }.ToIndexDefinition(DocumentStore.Conventions);
    }
}

Note how this way of structuring the map/reduce differs from MongoDB and CouchDB where the map operation decides which value to aggregate on by emitting it as the key – in RavenDB, this decision is made in the reduce function.

Note also that I need to create a type, ItemAggregate, that is used to tell which fields the reduce function should expect as input. It is important that this type’s fields correspond to those emitted from the map function, or else the serialization will fail silently, yielding no results.

Now, let’s execute the index creation like so:

IndexCreation.CreateIndexes(typeof (AggregateAmountsPerItem).Assembly, documentStore);

and now I am ready to query the index:

using(var session = documentStore.OpenSession())
{
    var amountsPerItem = session.Query<ItemAggregate>(typeof(AggregateAmountsPerItem).Name);
 
    foreach(var amountPerItem in amountsPerItem)
    {
        Console.WriteLine("{0}: {1} pcs", amountPerItem.Name, amountPerItem.Amount);
    }
}

which yields the following results:

Executing query '' on index 'AggregateAmountsPerItem' in 'http://localhost:8080'
Query returned 4/4 results
beer: 23 pcs
peanuts: 9 pcs
cashew nuts: 6 pcs
just nuts: 8 pcs

and that was pretty much what I expected.

That was a quick look at map/reduce queries in RavenDB. I’m sure there’s no end to the fun you can have with this kind of stuff, but I have yet to use map/reduce for anything in a real project, so I can’t really comment further on those.

Fun with RavenDB 2

This is the second post in a small series about RavenDB – a document database in .NET. I will try to touch the same areas as I did in my series on MongoDB, possibly comparing the two where I see fit.

Now, this time we will take a look at querying…

Introduction

Querying with MongoDB is extremely easy and intuitive when you’re used to relational databases – just type in some queries in the form of JSON documents and send them to the server and let it do its work.

Querying with RavenDB is a different story, because the only thing that can be queried is an index1. Let’s see…

Simple query

Let’s start out by handing a couple of documents to the server. Let’s execute the following:

using (var session = documentStore.OpenSession())
{
    session.Store(new Movie {Title = "The Big Lebowski", ViewCount = 200});
    session.Store(new Movie {Title = "Fear And Loathing In Las Vegas", ViewCount = 100});
    session.Store(new Movie {Title = "Adaptation", ViewCount = 20});
 
    session.SaveChanges();
}

Now, in order to be able to query these documents, we need to tell RavenDB to create an index… I like to be code-driven when I can, so let’s do it with C#… first, make a class somewhere derived from AbstractIndexCreationTask:

public class MoviesByViewCount : AbstractIndexCreationTask
{
    public override IndexDefinition CreateIndexDefinition()
    {
        return new IndexDefinition<Movie>
                    {
                        Map = movies => from movie in movies
                                        select new {movie.ViewCount}
                    }.ToIndexDefinition(DocumentStore.Conventions);
    }
}

In this example, I build the index purely from mapping the collection – i.e. there’s no reduce step. Note that the map step is a LINQ query that maps each document into the fields that should be used to build the index. That means that the example above will allow me to query this index and constrain by ViewCount of each movie.

Next, upon initialization, we need to tell RavenDB to create our indexes (if they have not already been created – otherwise, they’ll be updated):

var documentStore = new DocumentStore{...};
documentStore.Initialize();
 
IndexCreation.CreateIndexes(typeof (MoviesByViewCount).Assembly, documentStore);

That was easy. Now, let’s take the index for a spin… first let’s just get everything 2:

using(var session = documentStore.OpenSession())
{
    var movies = from m in session.Query<Movie>(typeof (MoviesByViewCount).Name)
                 select m;
 
    foreach(var movie in movies)
    {
        Console.WriteLine("Got {0} ({1} views)", movie.Title, movie.ViewCount);
    }
}

which results in the following output in the console:

Executing query '' on index 'MoviesByViewCount' in 'http://localhost:8080'
Query returned 3/3 results
Got The Big Lebowski (200 views)
Got Fear And Loathing In Las Vegas (100 views)
Got Adaptation (20 views)

which is pretty much what we expected. Note however that RavenDB is nice enough to tell when a query is executed and how many results are returned.

Now let’s use our index and change the query into this:

var movies = from m in session.Query<Movie>(typeof (MoviesByViewCount).Name)
                where m.ViewCount <= 100
                select m;

which gives me the following output:

Executing query 'ViewCount_Range:[* TO 0x00000064]' on index 'MoviesByViewCount' in 'http://localhost:8080'
Query returned 2/2 results
Got Fear And Loathing In Las Vegas (100 views)
Got Adaptation (20 views)

Note how the query criteria are translated into a Lucene query – that’s because RavenDB uses Lucene. NET for all of its indexing work. Just for the fun of it, let’s try using the index to constrain by title:

var movies = from m in session.Query<Movie>(typeof (MoviesByViewCount).Name)
             where m.Title == "The Big Lebowski"
             select m;

which results in the following output:

Executing query 'Title:[["The Big Lebowski"]]' on index 'MoviesByViewCount' in 'http://localhost:8080'
Query returned 0/0 results

It appears RavenDB will just go ahead and query Lucene for it, even though the index doesn’t have the specified field. Kind of weird, but maybe it’s because Lucene is itself a document DB, and there’s no way to tell beforehand whether a given index contains a document with the specified field.

Now that was a couple of simple queries. Next time, let’s try building a map/reduce query!

  1. Or is it? I should point out that Ayende said that RavenDB would have dynamically generated temporary indexes in the future, allowing ad-hoc quering… but what’s more, those temporary indexes would “materialize” and become permanent if you hit them enough times… that actually sounds extremely cool, and should allow for some truly frictionless and agile-feeling development.
  2. Note that because of RavenDB’s excellent safe-by-default philosophy, at most 128 documents will be returned! Therefore, the usual .Skip(n) and .Take(m) methods should be used to properly page the result sets.

    Note also, that by default you can perform only 30 operations resulting in remote calls within one IDocumentSession. This is another constraint that will guide you away from blowing off that left foot of yours :)

Fun with RavenDB 1

This is the first post in a small series about RavenDB – a document database in .NET. I will try to touch the same areas as I did in my series on MongoDB, possibly comparing the two where I see fit.

Now, first – let’s see if the raven can fly…

Getting started

I am extremely happy to see that Ayende has created the same installation experience as I got with MongoDB… i.e., to get the server running, perform the following steps (assuming the .NET 4 framework is installed on your system):

  1. Grab a ZIP with the lastest build here
  2. Unzip somewhere
  3. Go to /Server and run Raven.Server.exe

- and now the RavenDB server will be running on localhost:8080. That was easy. Now, try visiting http://localhost:8080 in your browser – now you should see the administration interface of RavenDB.

By the way, have you ever tried installing Microsoft SQL Server? Shudder!! :)

Connecting with the .NET client

I’m old school, so I am still using Visual Studio 2008. If you’re old school like me, add a reference to /Client-3.5/Raven.Client-3.5.dll – otherwise add a reference to /Client/Raven.Client.Lightweight.dll.

Now, to open a connection, do this:

var documentStore = new DocumentStore {Url = "http://localhost:8080"};
documentStore.Initialize();
 
using (var session = documentStore.OpenSession())
{
    // ....
}

- and then store the DocumentStore as a singleton in your program.

Inserting a document

Now, let’s try inserting a document… say we have a POCO model representation of a person that looks like this (allowing Address to be either DomesticAddress or ForeignAddress):

public class Person
{
    public string Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public Address Address { get; set; }
}
 
public abstract class Address
{
    public abstract string ToString(string separator);
}
 
public class ForeignAddress : Address
{
    public string[] AddressLines { get; set; }
 
    public override string ToString(string separator)
    {
        return string.Join(separator, AddressLines ?? new string[0]);
    }
}
 
public class DomesticAddress : Address
{
    public string Street { get; set; }
    public string HouseNumber { get; set; }
    public string PostalCode { get; set; }
    public string City { get; set; }
 
    public override string ToString(string separator)
    {
        return string.Join(separator, new[]
                                          {
                                              string.Format("{0} {1}", Street, HouseNumber),
                                              string.Format("{0} {1}", PostalCode, City)
                                          });
    }
}

Then, do this:

using (var session = documentStore.OpenSession())
{
    session.Store(new Person
                      {
                          FirstName = "Mogens Heller",
                          LastName = "Grabe",
                          Address = new DomesticAddress
                                        {
                                            Street = "Torsmark",
                                            HouseNumber = "4",
                                            PostalCode = "8700",
                                            City = "Horsens"
                                        }
                      });
    session.SaveChanges();
}

Now, let’s visit http://localhost:8080/raven/documents.html in the browser… it will probably look something like this:

Document in RavenDB

As you can see, RavenDB stores all documents in a single collection. Right now, there’s one person in there, and then there’s a document that RavenDB uses to generate integer IDs based on the hi-lo-algorithm. Rob Ashton has an explanation here on the design decisions made for this particular piece of RavenDB.

I like this particular decision, because it makes for some really nice human-readable, human-typeable IDs.

Note how the ID of the document is people/1 – RavenDB is smart enough to pluralize most names, which is pretty cool. Let’s click the document to see what’s in it:

Document in RavenDB

Note also how RavenDB puts type information in the document, allowing the proper subtype to be deserialized. Now, let’s try this out:

using (var session = documentStore.OpenSession())
{
    var me = session.Load<Person>("people/1");
    Console.WriteLine(@"{0} {1}
{2}", me.FirstName, me.LastName, me.Address);
}

- which results in the following console output:

Loading document [people/1] from http://localhost:8080
Mogens Heller Grabe
Torsmark 4
8700 Horsens

How cool is that?! (pretty cool, actually…)

Note that the pretty UI is based on the actual RavenDB interface to the world, which is REST-based. That means we can go to a DOS prompt and do this:

C:\>curl -X GET localhost:8080/docs/people/1
{"FirstName":"Mogens Heller","LastName":"Grabe","Address":{"$type":"raventjek.Class1+DomesticAddress, raventjek","Street":"Torsmark","HouseNumber":"4","PostalCode":"8700","City":"Horsens"}}

Now, that was a short dive into storing documents and retrieving them again by ID. We need to do more than that, though – otherwise we would have been content using a simple key/value-store. Therefore, in the next post, I will take a look at querying

Fun with NoRM 4

This time, a short post on how to model inheritance, which (at least in a class-oriented programming language) is one of the foundations of object-oriented programming.

Let’s take an example with a person, who has a home address that can be either domestic or foreign. Consider this:

public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
 
    public Address HomeAddress { get; set; }
}
 
public abstract class Address
{
    public abstract string FormatAddress(string separator);
}
 
public class DomesticAddress : Address
{
    public string Street { get; set; }
    public string Number { get; set; }
    public string PostalCode { get; set; }
    public string City { get; set; }
    // and 17 other fields here, according to whatever is standard in your country
 
    public override string FormatAddress(string separator)
    {
        return string.Join(separator, new[] { Street + " " + Number, PostalCode + " " + City });
    }
}
 
public class ForeignAddress : Adress
{
    public string[] AddressLines { get; set; }
 
    public override string FormatAddress(string separator)
    {
        return string.Join(separator, AddressLines);
    }
}

Now, when I create a Person with a DomesticAddress and save it to my local Mongo, it looks like this:

var people = mongo.GetCollection<Person>();
people.Insert(new Person
{ 
    FirstName = "Mogens Heller", 
    LastName = "Grabe",
    HomeAddress = new DomesticAddress 
    { 
        Street = "Torsmark", 
        Number = "4", 
        // etc
    }
});

which is all fine and dandy – and in the db:

> db.Person.findOne();
{
    "FirstName": "Mogens Heller",
    "LastName": "Grabe",
    "HomeAddress": {
        "Street": "Torsmark",
        "Number": "4",
        // etc...
    }
}

which looks pretty good as well. BUT when I try to load the person again by doing this:

var people = mongo.GetCollection<Person>();
var me = people.FindOne();

I get BOOM!!: Norm.MongoException: Could not find the type to instantiate in the document, and Address is an interface or abstract type. Add a MongoDiscriminatedAttribute to the type or base type, or try to work with a concrete type next time.

Why of course! JSON (hence BSON) only specifies objects – even though we consider them to be logical instances of some class, they’ re actually not! – they’re just objects!

So, we need to help NoRM a little bit. Actually the exception message says it all: Add a MongoDiscriminatedAttribute to the abstract base class, like so:

[MongoDiscriminated]
public abstract class Address
{
    public abstract string FormatAddress(string separator);
}

That was easy. Now, if I do a db.People.drop(), followed by my people.Insert(...)-code from before, I get this in the db:

> db.Person.findOne();
{
    "FirstName": "Mogens Heller",
    "LastName": "Grabe",
    "HomeAddress": {
        "__type": "MongoTest.DomesticAddress, MongoTest",
        "Street": "Torsmark",
        "Number": "4",
        // etc...
    }
}

See the __type field that NoRM added to the object? As you can see, it contains the assembly-qualified name of the concrete type that resulted in that particular object, allowing NoRM to deserialize properly when loading from the db.

Now, this actually makes working with inheritance hierarchies and specialization pretty easy – just add [MongoDiscriminated] to a base class, resulting in concrete type information being saved along with objects of any derived type.

Only thing that would be better is if NoRM would issue a warning or an exception when saving something that could not be properly deserialized – this way, one would not easily get away with saving stuff that could not (easily) be retrieved again.

Fun with NoRM 3

Third post in “Fun With NoRM” will be about how “the dynamism” of JavaScript and JSON is bridged into the rigid and statically typed world of C#. The thing is, in principle there’s no way to be certain that a JSON object returned from MongoDB will actually fit into our static object model.

Consider a situation where, for some reason, some of our orders have a field, PlacedBy, containing the name of the person who placed the order. Let’s see how things will go when adding the field and then querying all orders:

> use dbname
> var order = db.Order.findOne();
> order.PlacedBy = "El Duderino";
> db.Order.save(order);
var orders = mongo.GetCollection<Order>();
 
foreach(var order in orders.Find())
{
    Console.WriteLine("Order #{0}", order.Number);
}

- and BOOM ! – Cannot deserialize!: Norm.MongoException: Deserialization failed: type MongoTest.Order does not have a property named PlacedBy

This is actually pretty good, because this way we will never accidentally load a document with un-deserializable properties and save it back, thus truncating the document. But how can we handle this?

Well, NoRM makes it pretty easy: Make your model class inherit Expando, thus effectively becoming a dictionary. E.g. like so:

public class Order : Expando
{
   // ...
}

Now we can do this:

var orders = mongo.GetCollection<Order>();
 
foreach(var order in orders.Find())
{
    Console.WriteLine("Order #{0}", order.Number);
 
    if (order.AllProperties().Any())
    {
        var props = order.AllProperties().Select(p => string.Format("{0}: {1}", p.PropertyName, p.Value));
        Console.WriteLine("\t{0}", string.Join(", ", props.ToArray()));
    }
}

- which yields:

Order #1
    PlacedBy: El Duderino
Order #2
Order #3

when run with a small DB containing three orders. Nifty, huh?

If you’re sad that you’ve given up your single opportunity to inherit something by deriving from Expando, just go ahead and implement IExpando instead. Then you need to suply a few members, but you can just redirect to an internal Expando in your class.

Next up, a post on how to model inheritance hierarchies… one of my favorites! :)

Fun with NoRM 2

This second post in “Fun With NoRM” will be about querying…

How to get everything

Querying collections can be done easily with the anonymous types of C# 3 – e.g. the Order collection from my previous post can be queried for all orders like so:

var orders = mongo.GetCollection<Order>();
 
var allOrders = orders.Find();

How to use criteria

If we’re looking for some particular order, we can query by field values like so:

var orderNumber2 = orders.Find(new { Number = 2 });

or by using the query operators residing in the static class Q:

var ordersWithNumberGreaterThan2 = orders.Find(new { Number = Q.GreaterThan(2) });

More advanced criteria

The query operators can even be combined by combining criteria like so:

var ordersWithNumberBetween5And10 = orders.Find(new { Number = Q.GreaterThan(5).And.LessThan(10) });

Now, what about the nifty dot notation? This an example where C#’s capabilities don’t cut it anymore, as everything on the left side in an anonymous type need to be valid identifiers – so no dots in property names!

This is solved in NoRM by introducing Expando! (not to be confused with ExpandoObject of .NET 4, even though they have similarities…)

Expando is just a dictionary, so to query by the field of an embedded object, do it like so:

var q = new Expando();
q["Items.Name"] = "beer";
var ordersWithBeer = orders.Find(q);

As you can see, querying with NoRM is pretty easy – and I think the NoRM guys have found a pretty decent workaround in the case of dot notation, where C#’s syntax could not be bent further.

Stay tuned for more posts…

Fun with NoRM 1

My previous posts on MongoDB have been pretty un-.NETty, in that I have focused almost entirely on how to work the DB through its JavaScript API. To remedy that, I shall write a few short posts on how to get rolling with MongoDB using NoRM, the coolest C# driver for MongoDB at the moment.

First post will be on how to connect and shove data into MongoDB.

Short introduction to NoRM

NoRM is “No Object-Relational Mapping”. It’s a .NET-driver, that allows you to map objects and their fields and aggregated objects into documents. I like NoRM because it’s successfully preserved that low-friction MongoDB-feeling, bridging C#’s capabilities nicely to those of JavaScript in the best possible way, providing some extra C#-goodies along the way. Please read on, you’ll see…

Connect to MongoDB

Easy – can be done like so:

using(var mongo = Mongo.Create("mongodb://hostname/dbname"))
{
   // go crazy in here!!1
}

Inserting a few documents

Inserting documents with NoRM is easy – just create a class with fields and aggregated objects, and make sure the class either has a property named something like “Id” or has a property decorated with [MongoIdentifier], e.g. like so:

public class Order
{
    public Order()
    {
        Id = ObjectId.NewObjectID();
        Items = new List<Item>();
    }
 
    public ObjectId Id { get; set; }
    public int Number { get; set; }
    public List<Item> Items { get; set; }
}
 
public class Item
{
    public string Name { get; set; }
    public int Amount { get; set; }
}

- and then go ahead and pull a strongly typed collection and insert documents into it:

var orders = mongo.GetCollection<Order>();
 
orders.Insert(new Order {
    Number = 1,
    Items = {
        new Item { Name = "beer" },
        new Item { Name = "nuts" },
    }
});

Now, to make this work I need to create five 200-line XML-files with mapping info etc. </kidding> no, seriously – that’s all it takes to persist an entire aggregate root!!

Pretty cool, eh? That’ s what I meant when I said low friction. Stay tuned for more posts, e.g. on how to query a collection…

I will be speaking about NoSQL and MongoDB

as seen from the eyes of a .NET developer at two events in June (in Danish).

The first event is a JAOO Geek Night at Dong Energy in Skærbæk on Tuesday June 29th at 4:30 pm. You can read more about the free JAOO Geek Nights here.

The other event is a meeting in Århus .NET User Group, which is the day after, on Wednesday June 30th at 6 pm – you can sign up via Facebook here.

I’m really looking forward to it, because I think we will have some interesting discussions. And perhaps we can widen a few people’s horizons :)

Hope to see a lot of engaged people at both events.