December 18th, 2009

Posted by John

Strong statement, eh? The more I work with Mongo the more I am coming around to this way of thinking. I tell no lie when I say that I now approach Mongo with the same kind of excitement I first felt using Rails. For some, that may be enough, but for others, you probably require more than a feeling to check out a new technology.

Below are 7 Mongo and MongoMapper related features that I have found to be really awesome while working on switching Harmony, a new website management system by my company, Ordered List, to Mongo from MySQL.

Harmony

1. Migrations are Dead

Remember the first time you created and ran a migration in Rails. Can you? Think back to the exuberance of the moment when you realized tempting fate on a production server was a thing of the past. Well I have news for you Walter Cronkite, migrations are so last year.

Yep, you don’t migrate when you want to add or remove columns with Mongo. Heck, you don’t even add or remove columns. Need a new piece of data? Throw a new key into any model and you can start adding data to it. No need to bring your app to a screeching halt, migrate and then head back to normal land. Just add a key and start collecting data.

2. Single Collection Inheritance Gone Wild

There are times when inheritance is sweet. Let’s take Harmony for example. Harmony is all about managing websites. Websites have content. Content does not equal pages. Most website management tools are called content management systems and all that means is that you get a title field and a content field. There, you can now manage content. Wrong!

Pages are made up of content. Each piece of content could be as tiny as a number or as large a massive PDF. Also, different types of pages behave differently. Technically a blog and a page are both pages, but a page has children that are most likely ordered intentionally, whereas a blog has children that are ordered by publish date.

So how did Mongo help us with this? Well, we created a base Item model. Sites have many items. Items have many custom pieces of data. So, we have an Item model that acts as the base for our Page, Blog, Link, BlogPost and such models. Then each of those defines specific keys and behaviors that they do not have in common the other items.

By using inheritance, they all share the same base keys, validations, callbacks and collection. Then for behaviors and keys that are shared by some, but not all, we are creating modules and including them. One such module is SortableItem. This gets included in Page, Blog and Link as those can all be sorted and have previous and next items. The SortableItem module defines a position key and keeps the position order in check when creating and destroying items that include it. Think of it as acts_as_list.

This has been so handy. Steve was building the doc site and said he wished he had a link type, something that shows up in the navigation, but cross links to another section or another site. I was like, so make it! Here it is in all its glory.

class Link < Item
  include SortableItem

  key :url, String, :required => true, :allow_blank => false

  def permalink
    Harmony.escape_url(title)
  end
end

Yep, barely any code. We inherit from item, include the sortable attributes, define a new key named url (where the link should go to) and make sure the permalink is always set to the title. Nothing to it. This kind of flexibility is huge when you get new feature ideas.

All these completely different documents are stored in the same collection and follow a lot of the same rules but none of them has any more data stored with it than is absolutely needed. No creating a column for any key that could be in any row. Just define the keys that go with specific document types. Sweet!

3. Array Keys

Harmony has sites and users. Users are unique across all of Harmony. One username and password and you can access any specific site or all sites of a particular account. Normally this would require a join table, maybe even some polymorphism. What we decided to do is very simple. Mongo natively understands arrays. Our site model has an array key named authorizations and our Account model has one named memberships. These two array keys store arrays of user ids. We could de-normalize even more and just have a sites array key on user, but we decided not to.

class Site
  include MongoMapper::Document
  key :authorizations, Array, :index => true

  # def add_user, remove_user, authorized?
end

class Account
  include MongoMapper::Document
  key :memberships, Array, :index => true

  # def add_user, remove_user, member?
end

What is cool about this is that it is still simple to get all the users for a given site.

class Site
  def users
    user_ids = [authorizations, memberships].flatten.compact.uniq
    User.all(:id => user_ids, :order => 'last_name, first_name')
  end
end

The sweet thing about this is that not only does Mongo know how to store arrays in documents, but you can even index the values and perform efficient queries on arrays.

Eventually, I want to roll array key stuff like this into MongoMapper supported associations, but I just haven’t had a chance to abstract them yet. Look for that on the horizon.

4. Hash Keys

As if array keys were not enough, hash keys are just as awesome. Harmony has a really intelligent activity stream. Lets face it, most activity streams out there suck. Take Github’s for example. I will pick on them because I know the guys and they are awesome. They are so successful, they can take it. :)

It may be handy that I can see every single user who follows or forks MongoMapper, but personally I would find it way more helpful if their activity stream just put in one entry that was more like this.

“14 users started watching MongoMapper today and another 3 forked it. Oh, and you had 400 pageviews.”

Am I right? Maybe I have too many projects, but their feed is overwhelming for me at times. What we did to remedy this in Harmony is make the activity stream intelligent. When actions happen, it checks if the same action has happened recently and just increments a count. What you end up with are things in the activity stream like:

“Mongo is to Databases what Rails was to Frameworks was updated 24 times today by John Nunemaker.”

On top of that, we use a hash key named source to store all of the attributes from the original object right in the activity stream collection. This means we do 0, yes 0, extra queries to show each activity. Our activity model looks something like this (obviously this is really pared down):

class Activity
  include MongoMapper::Document
  key :source, Hash
  key :action, String
  key :count, Integer, :default => 1
end

Then, we define an API in that model to normalize the different attributes that could be there. For example, here is the title method:

class Activity
  def title
    source['title'] || source['name'] || source['filename']
  end
end
Activity.first({
  'source._id'   => id, 
  :action        => 'updated', 
  :created_at.gt => Time.zone.now.beginning_of_day.utc
})

How fricken sweet is that? Major. Epic.

5. Embedding Custom Objects

With Mongo, however, we just embed custom data right with the item. Anytime we get an item, all the custom data comes with it. This is great as there is never a time where we would get an attribute without the item it is related to. For example, here is part of an item document with some custom data in it:

{
  "_id"   =>..., 
  "_type" =>"Page", 
  "title" =>"Our Writing", 
  "path"  =>"/our-writing/", 
  "data"  =>[
    {"_id" =>..., "file_upload"=>false, "value"=>"", "key"=>"content"}, 
    {"_id" =>..., "file_upload"=>true, "value"=>"", "key"=>"pic"}
  ], 
}

Now anytime we get an item, we already have the data. No need to query for it. This alone will help performance so much in the future, that it alone had the weight to convince us to switch to Mongo, despite being almost 90% done in MySQL.

The great part is embedded objects are just arrays of hashes in Mongo, but MongoMapper automatically turns them into pure ruby objects.

class Item
  include MongoMapper::Document

  many :data do
    def [](key)
      detect { |d| d.key == key.to_s }
    end
  end
end

class Datum
  include MongoMapper::EmbeddedDocument

  key :key, String
  key :value
end

Just like that, each piece of custom data gets embedded in the item on save and converted to a Datum object when fetched from the database. The association extension on data even allows for getting data by its key quite easily like so:

Item.first.data['foo'] # return datum instance if foo key present

6. Incrementing and Decrementing

A decision we made the moment we switched to Mongo was to take advantage of its awesome parts as much as we could. One way we do that is storing published post counts on year, month and day archive items and label items. Anytime a post is published, unpublished, etc. we use Mongo’s increment modifier to bump the count up or down. This means that there is no query at all needed to get the number of posts published in a given year, month or day or of a certain label if we already have that document.

We have several callbacks related to a post’s publish status that call methods that perform stuff like this under the hood:

# ids is array of item ids
conditions = {:_id => {'$in' => ids}}

# amount is either 1 or -1 for increment and decrement
increments = {'$inc' => {:post_count => amount}}

collection.update(conditions, increments, :multi => true)

For now, we drop down the ruby driver (collection.update), but I have tickets (inc, the rest) to abstract this out of Harmony and into MongoMapper. Modifiers like this are super handy for us and will be even more handy when we roll out statistics in Harmony as we’ll use increments to keep track of pageviews and such.

7. Files, aka GridFS

Man, with all the awesome I’ve mentioned above, some of you may be tired, but I need you to hang with me for one more topic. Mongo actually has a really cool GridFS specification that is implemented for all the drivers to allow storing files right in the database. I remember when storing files in the database was a horrible idea, but with Mongo this is really neat.

We currently store all theme files and assets right in Mongo. This was handy when in development for passing data around and was nice for building everything in stage before our move to production. When we were ready to move to production, we literally just dumped stage and restored it on production. Just like that all data and files were up and running.

No need for S3 or separate file backup processes. Just store the files in Mongo and serve them right out of there. We then heavily use etags and HTTP caching and intend on doing more in the future to make sure that serving these files stays performant, but that is for another day. :) As of now, it is plenty fast and sooooo convenient.

Conclusion

We have been amazed at how much code we cut out of Harmony with the switch from MySQL to Mongo. We’re also really excited about the features mentioned above and how they are going to help us grow our first product, Harmony. I can’t imagine building some of the flexibility we’ve built into Harmony or some of the ideas we have planned for the future with a relational database.

I am truly as excited about the future of Mongo as I once was (and still am) about the future of Rails.

28 Responses to “Why I think Mongo is to Databases what Rails was to Frameworks”

  1. Great list. While points 2-5 touch on this, I think the best thing about MongoDB is that it’s just a more natural fit for web applications than an RDBMS ever could be. Mapping your application’s objects to documents in Mongo makes you say, “Wow, if only this was the way we always did it”. There’s no fighting with a competing paradigm.

    I think it’s something you have to work with to really grok, so hopefully this post will get more people to try out Mongo.

  2. I was just using that as a reason while talking with someone recently. There is a lot less work moving your ruby objects to the database. Thanks for specifically mentioning that.

  3. Dave Woodward Dave Woodward

    Dec 18, 2009

    Welcome back to Object Oriented Programming! :)

    This is the awesomeness I’ve been rambling about while using Smalltalk with an object database all summer.

    MongoDB is an object database of sorts (at least it can now transparently serialize objects). This is going to open up a whole new class of applications for Rails (one that you’re already building).

    I read somebody who had a metaphor for an object’s life cycle in the Rails stack. They said it is “frozen” in the database, turned to “liquid” while in a mongrel, and is “steam” if its stuck in memcache. MongoDB makes it so the objects are still liquid even when they’re in the database!

    Maglev is another technology (not as mature/stable as MongoDB yet) where all your objects are always a liquid! All cool stuff!

    P.S. congrats on launching Harmony!

  4. Excellent post, John!

    Thanks for championing the MongoDB movement in the Rails community!

    I’m very thankful that we happened to sit at the same table at lunch this past RailsConf to get that ball rolling. ;-)

  5. @Dave Thanks! Very true about liquid vs frozen/steam.

    @Jim You’re thankful? I’m thankful! Who knows how long it would have taken me to stumble across Mongo if it hadn’t been for that fateful day. ;)

  6. Given how much of Harmony happens in Javascript on the client, have you tried piping JSON straight from the database to the client without an extra serialization/deserialization step in Ruby?

    I wrote an internal app (in Cappuccino.org) that skips the server altogether and sends JSON straight to CouchDB. It was another “aha” moment that opened my eyes to the value of the schema-free database.

    Of course, one needs some authentication in the middle. But the need for a server-based webapp may now be optional in some cases.

  7. @Geoff That is the next step. We don’t feel that step is quite there yet, but browsers keep getting better. I totally think something like that is the future.

  8. John,

    What types of issues did you run into when switching to Mongo? Was it a 100% full conversion, or do you still have some data in mySQL? Do you have any performance comparisons? Does NewRelic track db performance with Mongo?

    You have me intrigued with this post and now I am thinking I might try to switch an app over.

  9. @Josh 100%. New relic does not support mongo that I know of. We (railsmachine and orderedlist) have some ideas for a scout mongo plugin, but haven’t created it yet.

    As far as issues, the biggest one is freeing your mind (ala the matrix). Rails/AR are so built on conventions that you get use to just building to those conventions. It takes some time before you start to think creatively about how you want to store your data.

    Worrying about tools and such is valid, but those will spring up fast, I believe.

  10. Great writeup John. Thanks. For the GridFS system did you use the carrierwave gem’s implementation or your own custom implementation?

  11. @Scott Note that I didn’t actually say we were using GridFS. :) We actually are just storing files directly in a collection. We have an asset model with key of type Binrary.

    We are ok with a max size of uploads being 4MB. If you have a file bigger than that we are just going to suggest you put that file somewhere else as most website related files are small images and pdfs.

  12. Fantastic article John. I had played with MongoDB a few weeks ago on a tiny pet project after reading one of your earlier articles, but this particular write-up has inspired me to dig deeper and work on something a bit more substantial using Mongo, to get a feel for everything that it can really offer. As discussed in the comments above, it certainly feels like a better fit for a lot of web development than traditional RDBMS, and so I can’t wait to get stuck in.

    I’d also like to help out with MongoMapper if I can along the way, what’s the best place to get started with dev there? Just picking some issues on the GitHub issues list and giving them a go, or is there any other areas that need specific attention?

    Congratulations on Harmony by the way, that’s shaping up to be a great app from the looks of things.

  13. @Elliot – Glad you found it interesting. The best way to get involved with MM is to start building apps with it and reporting back to me. :) There are still pain points and those kinds of reports from the field help me know where to focus and help shape solutions. The mailing list is a good place for conversation too.

  14. Cameron Cameron

    Dec 18, 2009

    Great insight. As a beginner to ruby, rails and programming in general, I was excited to read this, as a project I am working on totally needs the flexibility of a document store.

    I’m just not sure how to get it working with rails. How easy is it to swap out AR for something like Mongo? I hear Rails3 will be more agnostic, but does that include stuff like Mongo and CouchDB?

  15. @Cameron – MongoMapper is insanely easy to use with Rails. See this gist for an initializer and example database.yml. You just config.gem ‘mongo_mapper’ and optionally you can remove active record.

  16. Yay, another awesome overly excitable graphic designer cum Rubyist gets excited by some half-assed hack of a database, and all the other 20-something Rubyists get excited. While I am very happy for you that your small scale CRUD apps and silly toy databases help with your productivity, you might want to dial-down the level of “AWESOME” you spew like a fat kid in a candy store.

  17. I have been using mongo and MM for a project the last few months. I was initially very enthusiastic but as my data model has gotten more complex, I struggled to map it to the mongo way of thinking. Mongo does not do joins, so you are encouraged to store things hierarchically. So if I have a site with departments, and departments have products. The department can contain all the products, which might be nice for showing a department page which lists the products – 1 trip to the database. But to show a product page, guess what – you have to load the entire department again. And if products are in multiple departments, you have to make products a top level collection and then you’re doing multiple trips to the database to fetch them all for your product list page. I am having doubts now that document-oriented databases and web applications are a good match. Has anyone else struggled with this?

  18. @Brian, wow.

  19. What’s the index story like on Mongo? Can you create/drop indexes on the fly, or does it lock the collection?

  20. @Brian – Way to keep it constructive. You are my hero!

    @Greg – I would definitely do different collections for Departments and Products. Just store a department_id on Product and you are good to go. Or if a product can be in multiple departments, you could use an array key. :)

    @rick – You’d have to hit the mailing list. I assume lots of data plus a new index on an existing key would lock/slow things up. Good question.

  21. @rick – I asked for you as I am curious. I’ll post back here when I receive word.

  22. @rick – From the 10gen crew: “Creating a new index blocks. It doesn’t matter as all objects go into the index even if an object doesn’t have a given field.”

  23. This was awesome and I can’t wait to learn more.

    (Apologies but: when describing the GitHub feed, you used “there” twice where you meant “their.” Their activity stream, their feed.)

  24. @Giles – Thanks! No apologies needed. I wrote it in a hurry and by the end was too tired to proofread. Someone else just proofread it for me and I fixed several typos. Should be good now. :)

  25. I have a question – what about Cassandra ?

    Cassandra seems to be more “production ready” than Mongo and offers a lot of the same features ( http://www.engineyard.com/blog/2009/cassandra-and-ruby-a-love-affair/ )- did you compare the both ?

  26. @Sandeep – I’ve only grazed over cassandra. For whatever reason it didn’t really click with me like Mongo did. Also, what are you basing the “more production ready” claim on?

  27. Sandeep is probably referring to Cassandra being used at Facebook.

    Thanks for the great article, I am downloading MongoDB as I type (my skepticism discarded).

  28. Chris Kimm Chris Kimm

    Dec 19, 2009

    Thanks for the informative post. Are you using MongoDB on a 64-bit system? If not, are you concerned at all by the ~2.5 GB data limit on 32-bit systems?

Thoughts? Do Tell...


textile enabled, preview above, please be nice
use <pre><code class="ruby"></code></pre> for code blocks

About

Authored by John Nunemaker (Noo-neh-maker), a web developer and programmer who has fallen deeply in love with Ruby. More about John.

Syndication

Feed IconRailsTips Articles - An assortment of howto's and thoughts on Ruby and Rails.

Feed IconRails Quick Tips - Ruby and Rails related links that I find. Never more than 5 a day.

Web annotations