MongoDB 2.0 Should Have Been 1.0

No open source project has received more criticism in recent years than MongoDB. Most of the flak has revolved around technical implementation decisions made for the project, and perceptions of how the company behind MongoDB, 10gen, has used advantages gained from those design decisions in marketing their product. It seems that more blog posts have been written on why one should not use MongoDB than on why one should use it.

In a now-404ed July 2010 entry (mirror), Mikael Rogers, at the time an employee of the company behind CouchDB, wrote a highly-trafficked critique of MongoDB’s asynchronous writes and lack of single-server durability. Because of the design decisions made regarding persisting to disk, sending a hard kill signal to a standalone MongoDB instance could result in the database becoming corrupted beyond recovery. This was no good, of course. To their credit, engineers at 10gen worked to address the problem and added in single-server durability in the 1.8 release.

A few months later, Ethan Gunderson wrote a post titled “Two Reasons You Shouldn’t Use MongoDB”. Copping to an over-the-top title, he described the implications of the design decision to have a global write lock, as well as the lack, at the time, of single-server durability. His message was that developers need to understand MongoDB’s design decisions and the associated tradeoffs.

Being document-based datastores, Riak and CouchDB are the most direct competitors to MongoDB. The employees of Basho, the company behind Riak, seem to have several bones to pick with both MongoDB and 10gen. At JSConf, Basho employee Sean Cribbs was giving a presentation on Riak. When asked a question comparing Riak to MongoDB, he responded, “Mongo loses data.” UPDATE: Sean has just written a post titled MongoDB and Riak, In Context (and an apology).

A week later, in a weird self-congratulatory blog post, Basho COO Antony Falco laid into the marketing of MongoDB by 10gen. Strangely, he never once named the technology or the company while simultaneously accusing them of spreading FUD. His message: 10gen is flat out lying in its marketing of MongoDB, and it’s hurting the entire NoSQL community. Light on technical specifics but heavy on philosophical generalities, Falco’s screed seems to function as a rallying cry for his troops at Basho.

Several days ago, Michael Schurter, an engineer at Urban Airship, detailed his company’s trials and tribulations with MongoDB. Among them were the global write lock, the daunting complexity associated with running a MongoDB cluster, and the possibility of data loss.

Most recently, an anonymous ex-user, apparently compelled by social responsibility, left a Pastebin titled “Don’t use MongoDB”. Eight reasons are given, all of which rehash points made in the posts above. Most damaging are the accusations that MongoDB randomly loses data. But there’s no hard evidence given. And because of the anonymous nature of a Pastebin posting, the 10gen CTO is left to respond point-by-point on the comment thread in Hacker News with no real dialogue possible.

And so the critiques range from “understand the design tradeoffs and take appropriate steps” to “you’d be crazy to fall for the marketing hype – never ever use MongoDB”. Evaluating who is most correct is left as an exercise for the reader.

There is no doubt that MongoDB has benefitted from an aggressive marketing push. There are more MongoDB conferences held (organized by 10gen) and MongoDB books written (mainly by 10gen employees) than for the other NoSQL datastores combined. Here’s a chart from Google that compares relative search interest over the last 12 months:

NoSQL search terms

I’ll close with a criticism of my own: MongoDB 2.0, released in September, should have been version 1.0. Looking back, it’s clear that certain features added post-1.0 should have been added before a 1.0 was declared:

  • Configurable fsync time in 1.2.
  • Background index creation in 1.4.
  • Sharding and replica sets in 1.6.
  • Single-server durability as an option in 1.8
  • Single-server durability as the default in 2.0, as well as a standalone compact command.

The single-server durability issue was a huge one. Declaring a 1.0 version well before the issue was addressed was a major misstep by 10gen. Replica sets were the second attempt at a replication solution, much improved from the previous master/slave setup. However the two replication solutions were drastically different, and the transition should have been happening between 0.X releases, not 1.X releases. In an alternate universe, holding off on the 1.0 release may have tempered some criticism MongoDB has received.

MongoDB is on its way to becoming the default datastore for web apps. At version 2.0, it is finally a stable product free of unexpected surprises. That is, a proper 1.0 release. With this stability, developers should seriously consider working with it. Its developer experience is unmatched in the world of datastores, though Redis comes in at a close second.

The focus of my next blog post will be on developer happiness and MongoDB.

Disclosure: I have given presentations at three 10gen-sponsored MongoDB conferences. At the New York City event in May 2010, I ate a 10gen-sponsored turkey sandwich for lunch. At the D.C. event in November 2010, I had pre-conference drinks on the company’s tab and received the O’Reilly MongoDB book as a thank you gift.

blog comments powered by Disqus