The Problem
You may skip to the solution part at the bottom if you feel brave
Branching means you diverge from the main line of development and continue to do work without messing with that main line. - git
Branches are one of the more powerful features of a VCS, you can work on new features and test changes without touching your master code. But many times those changes implies to do some changes on the database too, and well, your database certainly doesn’t live within your repository, so if you want to go back to master or to another branch you need to discard the changes you made to the database.
For this you have many options:
- Rails Rollback any migration you made
- Drop your db and create (or restore) a clean one bye bye changes~
- Backup your current db and create (or restore a clean one) each time you switch branches so you have a standalone copy for each branch
As we can see each of this options has it’s drawbacks
- Rails When you rollback your migrations you may lose any data that is not properly handled in the rollback, face the fact that some data is not easily recoverable or even worse, you may face the ugly and troublesome
ActiveRecord::IrreversibleMigration
. - Dropping your db and creating a new one each time you need to switch to another branch can take a lot of time; specially when you have to download and import a production database dump with millions of records (I’m working on a project in which this process easily takes about 20~ minutes), besides that you also loses any changes you did in your branch, changes that may take some extra time to regenerate when you go back to the branch you where working on.
- This one is the least dangerous for your data, the only drawback I can think of is that you may need quite a lot storage if your database is big. But it ain’t such a big trouble.
But the biggest drawbacks of all of this, is that none of them is automatic and you can easily forget to do it once and end up with a messed up schema.rb
in your master
branch. Happened to me so many times before.
Luckily for us, Git gives us a way to do automated tasks whenever we change branches thanks to Git Hooks.
Git has a way to fire off custom scripts when certain important actions occur. There are two groups of these hooks: client-side and server-side. Client-side hooks are triggered by operations such as committing and merging, while server-side hooks run on network operations such as receiving pushed commits. - git
The solution
Taking advantage of this hooks I made a script that automatically creates a backup of your currently active database and switch between backups whenever you switch branches.
Requirements
- PostgreSQL (obviously)
- Git
- psql
- createdb
- dropbp
Ensure git
, psql
, createdb
and dropdb
are available in your path, also ensure you have a database user without password preferably named postgres
. (This should be the default setup in most of the development environments)
Installation
Run the following commands in your projects root path
1.- Download the script in your project’s hooks directory and make it executable
> $ wget -O .git/hooks/post-checkout https://gist.githubusercontent.com/yknx4/470ec4e1378a48382fdbaf6b5896f1d0/raw/393a1faad80795631e08c994505f0a706404cedb/post-checkout
> $ chmod +x .git/hooks/post-checkout
2.- Tweak the configuration
> $ vim .git/hooks/post-checkout
...
# --- Constants
SEPARATOR="/" # This is the separator that is going to be used to
MAIN_BACKUP="_backup" # This is the suffix that is going to be used for a main backup
MAIN_DATABASE="NAME_OF_YOUR_PROJECT_DB" # Here you should put the name of the db
DB_USER="postgres" # In case your user is not postgres you may change it
...
3.- Thats all! You should have you database branches working now.
Note: Due to the way branch information is passed to the hook, if you create branch A and then you create another branch B without making changes and then switch back to branch B it will actually not trigger a database backup because the hook will think branch B is branch A.
How it works?
some code has been trimmed to slim down the post size In the first section of the script we just get the parameters received by the hook and assign them to variables, also we set the constants we are going to use throughout the script.
#!/bin/sh
# --- Command line
PREVIOUS_HEAD="$1"
NEW_HEAD="$2"
IS_BRANCH_CHECKOUT="$3"
# --- Constants
SEPARATOR="/"
MAIN_BACKUP="_backup"
MAIN_DATABASE="awesome_project_db"
DB_USER="postgres"
I defined some functions to make the code more readable. Most of these functions receive the database name as parameter, is important to know that all the names will be trimmed to 63 characters because that is a hard limit imposed by postgres for the database names. Some really long branches names may end up with colliding backup names.
# --- Functions
database_exists () {
# logic to check if a database exists
}
kill_connections() {
# logic to kill al the connections to the active db,
}
duplicate_database_to() {
# logic to duplicate the main database to another one
}
drop_database() {
# logic to drop a database
}
rename_database() {
# logic to rename a database
}
This alias makes the best guess about a branch name with the revision SHA
alias branch_name='git name-rev --name-only'
I define these variables to have some useful information about git in a readable way
# --- Variables
PREVIOUS_BRANCH=`branch_name $PREVIOUS_HEAD`
NEW_BRANCH=`git branch | grep -e "^*" | cut -d' ' -f 2`
OLD_DATABASE=$MAIN_DATABASE$SEPARATOR$PREVIOUS_BRANCH
NEW_DATABASE=$MAIN_DATABASE$SEPARATOR$NEW_BRANCH
MAIN_BACKUP=$MAIN_DATABASE$MAIN_BACKUP
And here it comes the actual magic
# --- Script
if [ $IS_BRANCH_CHECKOUT -eq 0 ]; then
exit 0 # Exit script if it is not a branch change
fi
echo "Switching from $PREVIOUS_BRANCH to $NEW_BRANCH"
echo "Killing existing connections to $MAIN_DATABASE"
kill_connections # We need to kill all the connections or the commands here will fail
if ! database_exists $MAIN_BACKUP; then
echo "Creating a main backup, if something goes wrong you can manually restore from this backup."
duplicate_database_to $MAIN_BACKUP # We should always have a master backup
fi
if [ "$PREVIOUS_BRANCH" == "$NEW_BRANCH" ]; then
exit 0 # We didn't change branch, so lets just exit
fi
if database_exists $OLD_DATABASE; then
# We need to remove old backups if they exists (they shouldn't most of the times) but
# sometime when we didn't had changes in the branch when we created another one, so a
# backup for the parent branch is created instead of one for the new branch, this is
# to delete that wrong backup.
drop_database $OLD_DATABASE
fi
if database_exists $NEW_DATABASE; then
# When a backup for the new branch already exists
# If there is already a backup for the branch we are checking to, lets just rename the active database to create a new backup
rename_database $MAIN_DATABASE $OLD_DATABASE
# and then lets rename the existing backup to the active database name
rename_database $NEW_DATABASE $MAIN_DATABASE
else
# If the branch we are checking to doesn't have a backup let's just copy the db to still have an active database
duplicate_database_to $OLD_DATABASE
fi
I hope this is as useful as it was to me, switching to a new branch only takes about 15 seconds up from about 2 seconds, but refreshing my database from a production copy takes like 20 minutes so it’s manageable.