Initial Thoughts on MongoDB

Mongo DB Logo

I’ve been playing around with MongoDB extensively for the past month or so.  If you haven’t heard of it yet, MongoDB is Yet Another Way to Persist Data, but it’s particularly compelling because:

  • It’s so easy, both to set up and develop against
  • For certain persistence scenarios, it’s really fast
  • It’s schema-less, which means you can store and query different-looking objects together
  • Unlike relational databases, but like its fellow NoSQL ilk, it can scale horizontally, meaning (basically) you can throw more servers at a persistence problem and increase query performance

It’s called a document database, which tripped me up initially.  People usually ask if that means Sharepoint, but thankfully, it doesn’t.  ’Documents’ in this sense are a collection of (usually nested) properties that have often have some single identifier (if you don’t add one, Mongo makes one up, called _id).  These properties are stored in a format called BSON, which looks like JSON  which really just means squiggly braces with names and values.

Here’s an example…the document, representing the state of two collections of entities, just has two string properties, ActiveCollection and BackupCollection, with a single date called LastSwap:

{ 
    “_id”: 
    { 
       “$oid” : “4e4ab3ab6829a51c9c61d561″ 
    }, 
    “ActiveCollection”: “Collection_B”, 
    “BackupCollection”: “Collection_A”, 
    “LastSwap”: “09/14/2011 17:43 -04:00″
}
How easy is it to set up?  You basically:

From then on, you can point your code at it, and as long as you can see the box / port from your application, you can insert, UPsert (insert or update with one command), drop a whole collection (basically a table in relational nomenclature), create collections, and query.  It’s wonderful.

People who have any ORM background will probably be wondering how you map things into and out of Mongo.  You don’t.  You just take the object you want to save, and say, Save!  And it saves it.  No making every property virtual, no mapping files, no strange bi-directional association patterns.

There are also a handful of free and paid ways to manage Mongo data.  My two current favorites are MongoVUE and MongoExplorer.  MongoVUE is a little bit more full-featured and lets you create indexes and add users from the UI, and it has a cool “learning” interface that shows you the command line statements you could run to get the same output in a terminal window.  MongoExplorer just looks cool, is free, and pretty easy to see / query your data.  Below is a screen of MongoVUE with some made up vehicle data.

MongoVue

Basically, if you have objects that have a natural hierarchy, and you can get to and from them using some sort of identity, you’ll probably love Mongo.

To understand why this is the most ideal Mongo scenario, just think about how we would achieve the same thing in a relational database.  Let’s just say it was for tracking mortgage applications.  You’d most likely have a Mortgage table.  And then a Customer table, which you could tie by ID to the Mortgage table for a signer and co-signer.  But then those customers could have many addresses and phone numbers.  So now there’s an Address table and a PhoneNumber table, and any number of other child tables.  Just to get a single Mortgage application, you’re talking about a significant amount of joining and reads.

In Mongo, it’s one logical read.  Give me Mortgage number 5, please.  No joins on potentially massive tables, just one distinct document (object).

So as with all things in life and development, there are always tradeoffs, right?

Here’s what you’re trading away to get all of this cool stuff:

  • NoSQL really means Not Relational.  Any queries that aren’t “give me this entire document by ID” but rather, “give me documents that link to these other documents” or “give me the objects with these distinct properties” are not going to perform like relational databases, at least at any non-trivial scale.
  • As a corollary, you wouldn’t let analysts loose on Mongo data.  It’s definitely not for business intelligence scenarios.  You would need to save that data separately in a relational form that they like.  And they probably wouldn’t know how to query Mongo anyway :)
  • Remember ACID?  Well…it’s not.  Particularly in Consistency.  It’s Eventually Consistent.  As such, you should probably think long and hard about storing mission-critical data in Mongo.
  • Transactions are only at the document level.  You can’t update Object A and Object B in such a way that if you edit A first, then try to edit B and fail, that you can roll back A’s change.  You could roll your own, but at that point you should stop and ask yourself if Mongo is right for the data (hint: the answer is likely “no”).  Alternately, RavenDB allows for multi-document transactions if you can guarantee it’s not scaling across physical servers.
  • Security exists, but Mongo is really meant to be used in a trusted environment.  You can set up users and give them access to certain collections, but there’s not any hardcore encryption or trusted access functionality (i.e., like in SQL Server).

There’s lots more to Mongo, and it’s really exciting that this and other NoSQL technologies are providing ways to store data that really work beautifully for some specific problem sets.  It’s not a replacement for relational databases, but given the right problem – Mongo rocks.