Developer Happiness and MongoDB

2012-05-23

MongoDB continues to be the most popular of the NoSQL data stores, and it continues to generate disgruntled blog posts from former users. I’ve been using MongoDB since the summer of 2009, and my experiences have been consistently positive. In fact, MongoDB is my current default data store when building a new web app, and has been since shortly after I started using it. I understand all the criticisms being levied at Mongo, but they’re not going to change my position.

Why not? Because using MongoDB makes me happy.

Matz’s Revolution

“Ruby is designed to make programmers happy.”

Yukihiro “Matz” Matsumoto created a programming language optimized for developer happiness. Ruby wasn’t optimized for technical goals like speed or scale (insert Rails-can’t-scale joke here). It was optimized for the happiness of the human beings who would spend all day looking at it.

DHH, creator of Ruby on Rails, adopted Matz’s philosophy. The rise in popularity of Rails had a lot to do with how good the out-of-the-box experience was. Developer happiness leads to observable benefits in productivity. Here’s an excerpt from the 37signals ebook, Getting Real:

Happiness has a cascading effect. Happy programmers do the right thing. They write simple, readable code. They take clean, expressive, readable, elegant approaches. They have fun.

Left out of that quote is that happy programmers are more creative. And happiness and creativity build off another, creating a virtuous cycle of collaboration. The Django committers were already quite content with their toolset when they borrowed some ideas from the Rails core team, and vice versa. Sinatra’s creator wasn’t happy with some aspects of Rails, so he made his own thing, and shared his happiness with like-minded Ruby developers who realized they felt the same way. The creator of CoffeeScript borrowed aspects of Ruby and Python and drastically improved the experience for those of us who weren’t having much fun writing JavaScript.

The Freedom of Choice

Programmers tend to be quite passionate about the tools they use: operating systems, text editors, IDEs, languages, frameworks, libraries. And they want to switch when they’ve discovered a better tool for the job. Freedom of choice is incredibly important. But strangely for web developers, the choice for data store was always made for them: an RDBMS.

The NoSQL movement introduced genuine choice in data stores for the first time in the Web era. The differences between MySQL and Postgres and Oracle started to look small when compared to the new paradigms available. The realization that a relational database isn’t the only way to persist data is an epiphany.

Decades of RDBMS dominance led to dogma. Incontrovertible truths. Data belongs in the rows and columns of a table. Tables must be normalized. Repeating data is bad. Tables are joined. Transactions are necessary and schemas must be defined, otherwise chaos will ensue. NoSQL freed developers from those chains. And the newly liberated are happy.

Choice and Control in MongoDB

MongoDB is schemaless, which means the data schema is defined by the application, not the database. Data migrations are run to actually transform data, not merely for a simple change in the database schema. As a result, they’re run far more infrequently, which in turns speeds up the development process. If your application needs a middle_initial field, you add it to your application. MongoDB doesn’t need to be told about it.

MongoDB is document-oriented. Data isn’t confined to rows. It instead lives in a richly structured JSON document. Thanks to that, the dogma of normalization can be thrown out the window. A blog post has many comments. Well then, those comments should be stored inside the blog post, not in a separate table. A blog post has many tags. Store those tags inside the blog post and forget about the rule that says you need a join table named blog_posts_tags.

The schemaless, documented-oriented nature of MongoDB translates into a newly found sense of developer control. Your data schema mirrors the actual look of your application. You can glance at a mockup for a new feature, and quickly envision how the data will be persisted.

For example, a blog post has an author. If the application has user accounts, those authors are persisted in a users collection. The old dogmatic way of thinking dictates that in order to display that user’s name on a blog post, there must be a foreign key and a join query. But in MongoDB, the developer has control over the way the data is stored. I’d store a reference to the user, but I’d also store the user’s full name right there, inside the blog post. The blog post document is a faithful representation of what makes it to the screen. It’s not bound to the rules and conventions of the data store it’s contained in.

The tradeoff with such denormalization is that you’ll have to write a little bit of code to update all the blog posts by the author if the name is ever changed. But that’s a trivial bit of housekeeping that quickly becomes second nature. It’s much more natural than adhering to Boyce-Codd Normal Form.

MongoDB is flexible. It puts the developer in control, instead of dictating how it must be used. A breath of fresh air.

JSON

JSON is a near-perfect format for persisting data in object-oriented applications. Because it’s literally an object notation, there’s little impedance mismatch between the native objects in the programming language and how they’re stored in Mongo. ORMs were created because objects needed mapping to relational tables. That’s no longer the case with MongoDB, though object-document mappers do exist, and provide useful utilities like callbacks and validations.

JSON is a joy to work with because it eliminates so much friction. Unlike XML, it’s simple, succinct, and quickly scannable. It’s self-documenting. A JSON document represents a fully-formed object, so no parsing is necessary. It’s become the lingua franca for exchanging data on the Web. And it’s universal: every programming language understands it.

Queries and commands in MongoDB are done via calls to JavaScript functions, which take in a JSON object for the various conditions and options. This results in a magical side effect when interfacing with MongoDB from a driver in another language. Querying in any object-oriented language remains straight-forward and natural because the ideas in JavaScript and JSON are easily translated. A JSON object easily maps to a hash in Ruby, or a dictionary in Python. There’s little mental overhead needed.

Every developer I know loves JSON. It makes them happy. When using MongoDB, you get to think in JSON all day long.

Minimum Viable Features

MongoDB ships with an assortment of useful utilities that make working with it just a little bit nicer. There’s GridFS for file storage. Geospatial indexing to do lat/long calculations. Map/reduce for aggregation. There’s even a way to get full-text search capabilities using an index on an array of strings.

These aren’t full replacements for more robust solutions. They’re more like minimum viable features. There’s always a networked filesystem or S3 for file storage and PostGIS for serious geospatial work. There’s Hadoop for map/reduce and Elastic Search for full-text search. But the ability to do those things in MongoDB for simple use cases makes the development process a little more pleasant.

There are also neat little tidbits scattered around. Documents in MongoDB have a unique identifier, automatically indexed, called an ObjectId. By default it’s generated as a GUID with the timestamp embedded inside. No longer do you have to store a separate created_at timestamp. Or you can if you want. You have the choice, after all.

Unintentionally Delivering Happiness

It seems to me that the creators of MongoDB never set out to optimize for happiness. I’d guess that Redis is the one NoSQL data store where the creators actively think about developer happiness.

Instead, 10gen’s marketing has been focused on scaling. The brand itself, Mongo, came out of the word humongous. And much of the criticism of MongoDB has come from the technology choices made in the name of scaling. But the reality is that scaling any technology is difficult, and there’s no silver bullet.

The criticism that most strikes fear, uncertainty, and doubt in the hearts of developers is that MongoDB loses data. I’ve never experienced it. Turn on journaling, use a replica set in production, and make solid backups. You’ll be fine.

I build web software for a living, and I want to choose the tools that will make me happy while doing it. That’s the value most important to me, and I find myself sticking with MongoDB.