This seems to be the season for talking about Rails migrations. A lot of people are finally discovering them and finding that they’re very useful for maintaining your database schema over time. I’m a big fan of Rails migrations; we’ve been using them with Typo since the middle of July, when they were all new and shiny. We’re currently up to 24 migrations in the Typo source tree. We’re even using migrations to create our initial database, via my Schema Generator. I haven’t done a formal survey, but I suspect that Typo is the biggest open-source user of migrations, and may actually be the biggest user overall.

The big problem is that we’ve been using migrations wrong the whole time, and we just realized it.

There are probably a dozen bugs in Typo’s bug tracker that boil down to “I fell behind the trunk and now rake migrate throws exceptions and I can’t upgrade anymore.” The problem is that migrations are designed to run against an earlier version of your database, but they use the current version of your code. The first time that this caused problems was with the migration from Typo 2.0 to 2.5–we’d added two new fields to articles. Migration number 7 added the permalink field and a before_save hook to make sure that all saved articles have permalinks. Then migration number 9 added GUIDs and a second before_save hook to fill the guid field. Both migrations did Articles.find(:all).each { |a| a.save } to update each Article and populate the new fields.

This worked great for developers who frequently upgraded. A few days after the GUID migration went in, though, we started getting weird bug reports–users who tried to do both upgrades at the same time found that migration number 7 was dying. What was happening was that migration number 7 added the new permalink field to articles, but when it went to run the save loop both before_save hooks ran, and Typo tried to add a GUID to each article. However, the guid field didn’t exist yet, so the migration threw a bunch of exceptions and died.

This caused a bunch of grumbling on the Typo IRC channel. We threw around a bunch of possible fixes. Our favorite was separating migrations into two parts–a schema change part and a data change part. First we’d run all of the schema changes, and then update all of the data. As a work-around, we added a hack that checked the current schema version and disabled specific before_save filters for older versions.

We managed to keep this little bandaid working until a couple weeks ago, when a huge set of new migrations went it; they renamed the articles table and merged several other tables into the new contents table using STI. And, again, we found that older migrations broke when users tried to upgrade from Typo 2.5.6 to the current dev tree. Unlink the permalink/guid case, this time there was no simple workaround. We couldn’t just add a couple if statements in a filter and make it all go away.

The fundamental problem is that we were using the wrong mental model for migrations. I saw migrations as a one-dimensional thing–a list of steps for migrating old data into the new format. In this view, the migration for going from schema version 6 to schema version 7 is constant–once it’s been written, the only reason to change it is if a bug turns up in the logic for that migration. Otherwise, the migration code should remain unchanged over time.

And that’s the problem–migrations aren’t one-dimensional. They are (and need to be) two dimensional–the schema version is one dimension and the code version is the other. Individual migrations exist to migrate from a specific old schema version to the current version, using the current code. Each migration should change over time to adapt to the changes in the code. So, the right fix for the permalink migration that caused so many problems wasn’t to add a bunch of logic to before_save. Instead, we should have deleted the entire save loop from the migration, and trusted the GUID migration to update both fields. If that wasn’t good enough, then we should have added a new migration at the end to do permalink cleanup after the GUIDs were added.

Once I came to grips with this, the migration changes needed to allow 2.5.x users to upgrade to the current trunk were pretty simple, and took about 5 minutes to write and test.

Or was I the only person in the Rails universe who thought about migrations this way?