Database branches with Git Hooks

The Problem

You may skip to the solution part at the bottom if you feel brave

Branching means you diverge from the main line of development and continue to do work without messing with that main line. - git

Branches are one of the more powerful features of a VCS, you can work on new features and test changes without touching your master code. But many times those changes implies to do some changes on the database too, and well, your database certainly doesn’t live within your repository, so if you want to go back to master or to another branch you need to discard the changes you made to the database.

For this you have many options:

As we can see each of this options has it’s drawbacks

But the biggest drawbacks of all of this, is that none of them is automatic and you can easily forget to do it once and end up with a messed up schema.rb in your master branch. Happened to me so many times before.

Luckily for us, Git gives us a way to do automated tasks whenever we change branches thanks to Git Hooks.

Git has a way to fire off custom scripts when certain important actions occur. There are two groups of these hooks: client-side and server-side. Client-side hooks are triggered by operations such as committing and merging, while server-side hooks run on network operations such as receiving pushed commits. - git

The solution

Taking advantage of this hooks I made a script that automatically creates a backup of your currently active database and switch between backups whenever you switch branches.

Requirements

Ensure git, psql, createdb and dropdb are available in your path, also ensure you have a database user without password preferably named postgres. (This should be the default setup in most of the development environments)

Installation

Run the following commands in your projects root path

1.- Download the script in your project’s hooks directory and make it executable

> $ wget -O .git/hooks/post-checkout https://gist.githubusercontent.com/yknx4/470ec4e1378a48382fdbaf6b5896f1d0/raw/393a1faad80795631e08c994505f0a706404cedb/post-checkout
> $ chmod +x .git/hooks/post-checkout

2.- Tweak the configuration
> $ vim .git/hooks/post-checkout

...
# --- Constants
SEPARATOR="/" # This is the separator that is going to be used to
MAIN_BACKUP="_backup" # This is the suffix that is going to be used for a main backup
MAIN_DATABASE="NAME_OF_YOUR_PROJECT_DB" # Here you should put the name of the db
DB_USER="postgres" # In case your user is not postgres you may change it
...

3.- Thats all! You should have you database branches working now.

Note: Due to the way branch information is passed to the hook, if you create branch A and then you create another branch B without making changes and then switch back to branch B it will actually not trigger a database backup because the hook will think branch B is branch A.

How it works?

some code has been trimmed to slim down the post size In the first section of the script we just get the parameters received by the hook and assign them to variables, also we set the constants we are going to use throughout the script.

#!/bin/sh

# --- Command line
PREVIOUS_HEAD="$1"
NEW_HEAD="$2"
IS_BRANCH_CHECKOUT="$3"

# --- Constants
SEPARATOR="/"
MAIN_BACKUP="_backup"
MAIN_DATABASE="awesome_project_db"
DB_USER="postgres"

I defined some functions to make the code more readable. Most of these functions receive the database name as parameter, is important to know that all the names will be trimmed to 63 characters because that is a hard limit imposed by postgres for the database names. Some really long branches names may end up with colliding backup names.

# --- Functions
database_exists () {
  # logic to check if a database exists
}

kill_connections() {
  # logic to kill al the connections to the active db,
}

duplicate_database_to() {
  # logic to duplicate the main database to another one
}

drop_database() {
  # logic to drop a database
}

rename_database() {
  # logic to rename a database
}

This alias makes the best guess about a branch name with the revision SHA

alias branch_name='git name-rev --name-only'

I define these variables to have some useful information about git in a readable way

# --- Variables
PREVIOUS_BRANCH=`branch_name $PREVIOUS_HEAD`
NEW_BRANCH=`git branch | grep -e "^*" | cut -d' ' -f 2`
OLD_DATABASE=$MAIN_DATABASE$SEPARATOR$PREVIOUS_BRANCH
NEW_DATABASE=$MAIN_DATABASE$SEPARATOR$NEW_BRANCH
MAIN_BACKUP=$MAIN_DATABASE$MAIN_BACKUP

And here it comes the actual magic

# --- Script

if [ $IS_BRANCH_CHECKOUT -eq 0 ]; then
  exit 0 # Exit script if it is not a branch change
fi

echo "Switching from $PREVIOUS_BRANCH to $NEW_BRANCH"
echo "Killing existing connections to $MAIN_DATABASE"
kill_connections # We need to kill all the connections or the commands here will fail

if ! database_exists $MAIN_BACKUP; then
  echo "Creating a main backup, if something goes wrong you can manually restore from this backup."
  duplicate_database_to $MAIN_BACKUP # We should always have a master backup
fi

if [ "$PREVIOUS_BRANCH" == "$NEW_BRANCH" ]; then
  exit 0 # We didn't change branch, so lets just exit
fi

if database_exists $OLD_DATABASE; then
  # We need to remove old backups if they exists (they shouldn't most of the times) but
  # sometime when we didn't had changes in the branch when we created another one, so a
  # backup for the parent branch is created instead of one for the new branch, this is
  # to delete that wrong backup.
  drop_database $OLD_DATABASE
fi

if database_exists $NEW_DATABASE; then
  # When a backup for the new branch already exists
  # If there is already a backup for the branch we are checking to, lets just rename the active database to create a new backup
  rename_database $MAIN_DATABASE $OLD_DATABASE
  # and then lets rename the existing backup to the active database name
  rename_database $NEW_DATABASE $MAIN_DATABASE
else
  # If the branch we are checking to doesn't have a backup let's just copy the db to still have an active database
  duplicate_database_to $OLD_DATABASE
fi

I hope this is as useful as it was to me, switching to a new branch only takes about 15 seconds up from about 2 seconds, but refreshing my database from a production copy takes like 20 minutes so it’s manageable.


development