Best practices with data migrations

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the databases category.

Last Updated: 2023-12-03

Compress your old migration files

After about 10 years of maintaining my app, it got to the point where I had perhaps 100's of migration files in my project.

Therefore on "Sep 2019" I removed most of these (already migrated) migrations.

Some time later, when I recreated my database locally, the command for migrating (rake db:migrate) failed.

This was because one of the surviving migrations was 20181023152914updateusers.rb, assumed no-longer existent tables were present

def change
  change_column :users, :subscribed, :boolean, default: true

I was able to get around this issue in Rails by migrating the database from the schema doc $ rake db:schema:load. But it was still annoying to break the migrations command.

So, my lesson here is: Instead of deleting migrations outright, create a new condensed migration create_current_state that contains everything from the schema up to this point.

NB: I need to be careful here to avoid losing data when this migration is run in production, therefore I should get rid of any force: true statements. Then I would remove all the old migrations except this current one, e.g. with:

$ cd db/migrate/ && ls | grep -v '20100427083434_create_current_state.rb' | xargs rm

Never do data transformations in a migration

In the past, I made the mistake of including data transformation stuff in migrations. This proved a bad idea because it slowed down future migrations (or down-right broke them), and it introduced a dependency on old code (e.g. in the User) model, which increases maintenance costs and scope for error.

Instead use one-off scripts in rails/runners.