Posts Tagged ‘json’

First impressions of CouchDB

Sunday, March 29th, 2009

I’m spending my weekend playing with CouchDB. It’s a “document” database (storing JSON objects) which is pitching itself as more suitable for some web applications than traditional relational databases like MySQL. There are other resources that explain it well so I won’t go into its feature set, this is more a note of what I thought of it after a play.

The first thing you notice is the built-in UI is pretty slick. Think of a really slimmed down “PHPMyAdmin” that’s built into the database. This let me start playing around with the new concepts of CouchDB (documents, map/reduce) without getting bogged down with hooking it up to PHP.

I started creating some documents and it hits you pretty fast that this isn’t anything like MySQL. All the documents are stored within a database (you can have many documents inside a database, and many databases) but there’s no type-differentiation between them: it really is just a big bag full of JSON objects. The convention is to create a “type” property on your documents yourself and once I did that things started to click better.

So having put some documents in, how the heck do I query them? There’s no SQL; instead you need to create views which use two functions (map/reduce) to provide the logic for the query. It took me a good few hours to read up on this and understand the basics. The idea is that for a given view, all documents will be sequentially passed into your “map” function and it’s up to your logic (programmed in Javascript) to decide if any data from that document is going to be put on the output stack. A simple example of this is a view to extract documents of a certain “type” (assuming I’ve used the convention of giving my documents a type property).

function(doc){
  if(doc.type == "person"){
    emit(doc._id, doc);
  }
}

This map function will “emit” (put on the output stack) all my “person” documents, indexed by the _id property (a UUID that each document must have, like a primary key). This gives you a fine level of control of the outputted data (I think more than SQL would) because you’ve got a full programming language available to define the logic; although this worries me a little: with great power comes great responsibility! It took me a while to understand that there will be no query logic in my application any more. That’s a double-edged sword I suppose (good: no dodgy SQL written by juniors; bad: you lose the flexibility of writing arbitrary queries).

I tried to setup a sample database for a project I’m working on and got stuck really fast. Getting your head around “documents” when you’ve been in “relational database” mode for 10 years is difficult. I found a useful trick was to think about how i’d implement my project in the offline world, with paper-based forms and documents. It definitely feels “wrong” to be breaking the database rules and I think it’s going to take some more time to understand the best ways to setup my data in documents.

I do really like the general concept of a document/object database though and I think it’s a better fit for the majority of the web application work that I do. Currently I feel I’m doing a disproportionate amount of fighting with my relational database (MySQL), particularly on very active projects where I’m often touching the database to implement new functionality; I’m hoping that something like CouchDB will “go with the grain” better.

It’s definitely something I’ll be spending some more time getting to know.