Lefthook, Crystalball, and git magic for smooth development experience

envek

Andrey Novikov

Posted on July 10, 2019

Lefthook, Crystalball, and git magic for smooth development experience

From this step-by-step tutorial, you will learn how to setup Lefthook git hooks manager, Crystalball test selection library, and also how to automatically install missing gems and migrate your database when you're switching to a feature branch and back to the master.

Every project should have a CI installed. But CI builds sometimes can queue up, you need to wait for the notification. And anyway, is there a method to reduce the time for "implement-test-fix" cycle, save keystrokes and mouse clicks, but don't let broken code to get into the project repository? Yes, it exists long ago and is well-known: git hooks. Plain executable scripts in your local .git/hooks/ directory. But it is so bothersome to set them up, to update them. and to sync with your collaborators because you can't commit them to the repository itself.

Setting up Lefthook

Lefthook is our own git hook manager written in Go. Single dependency-free binary. Fast, reliable, feature-rich, language-agnostic.

GitHub logo Arkweid / lefthook

Fast and powerful Git hooks manager for any type of projects.

Build Status

Lefthook

The fastest polyglot Git hooks manager out there

Fast and powerful Git hooks manager for Node.js, Ruby or any other type of projects.

  • Fast. It is written in Go. Can run commands in parallel.
  • Powerful. With a few lines in the config you can check only the changed files on pre-push hook.
  • Simple. It is single dependency-free binary which can work in any environment.

đź“– Read the introduction post

# On `git push` lefthook will run spelling and links check for all of the changed files
pre-push
  parallel: true
  commands
    spelling
      files: git diff --name-only HEAD @{push}
      glob: "*.md"
      run: npx yaspeller {files}
    check-links:
      files: git diff --name-only HEAD @{push}
      glob: "*.md"
      run: npx markdown-link-check {files}
Sponsored by Evil Martians

Usage

Choose your environment:

Then you can find all Lefthook features in the full…

  1. Install it with gem, npm, or from source or your OS package manager.

  2. Define your hooks in config file lefthook.yml

    pre-push:
      parallel: true
      commands:
        rubocop:
          tags: backend
          run: bundle exec rubocop
        rspec:
          tags: rspec backend
          run: bundle exec rspec --fail-fast
    
  3. And run lefthook install.

And voila, starting from now rspec and rubocop will be run in parallel on every git push and push will be aborted if they would find any issues.

But why you've chosen pre-push over pre-commit?

First of all, sometimes, during refactorings, you want to make a lot of small commits locally. Most of them won't pass linters. Why execute all this machinery if you know that it won't pass? Afterward, you will squash and reorder them with git rebase --interactive and push clean code to the repo.

More importantly, some things are not executing fast. Wait a minute on every git commit is meh. You lose speed. So why not to move long operations to the much more rare push event?

However, running whole test suite may take a too long time (and we have CI exactly for that anyway), so we need some method to run only specs that we probably might break by our changes.

Setting up Crystalball

Crystalball is a Ruby library by Toptal which implements Regression Test Selection mechanism. Its main purpose is to select a minimal subset of your test suite, which should be run to ensure your changes didn't break anything. It is a tricky problem in Ruby applications in general and especially in Rails applications because of Ruby on Rails constant autoloading mechanism.

Crystalball solves this problem by tracing code execution and tracking dependencies between files when your test suite is running. Using this profiling data, it could tell which files affect which. Take a look at slides about Crystalball from the talk at RubyKaigi 2019.

GitHub logo toptal / crystalball

Regression Test Selection library for your RSpec test suite

Crystalball

Crystalball is a Ruby library which implements Regression Test Selection mechanism originally published by Aaron Patterson Its main purpose is to select a minimal subset of your test suite which should be run to ensure your changes didn't break anything.

Build Status Maintainability Test Coverage

Installation

Add this line to your application's Gemfile:

group :test do
  gem 'crystalball'
end

And then execute:

$ bundle

Or install it yourself as:

$ gem install crystalball

Usage

Please see our official documentation.

Versioning

We use semantic versioning for our releases.

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/toptal/crystalball This project is intended…

  • Install it
   # Gemfile
   gem "crystalball"
  • Configure for our pre-push case (by default crystalball is configured to be used in pre-commit hooks)
   # config/crystalball.yml
   ---
   map_expiration_period: 604800 # 1 week
   diff_from: origin/master
  • Setup your test suite to collect code coverage information:
   # spec/spec_helper.rb
   if ENV['CRYSTALBALL'] == 'true'
     require 'crystalball'
     require 'crystalball/rails'

     Crystalball::MapGenerator.start! do |config|
       config.register Crystalball::MapGenerator::CoverageStrategy.new
       config.register Crystalball::Rails::MapGenerator::I18nStrategy.new
       config.register Crystalball::MapGenerator::DescribedClassStrategy.new
     end
   end
  • Generate code execution maps:
   CRYSTALBALL=true bundle exec rspec
  • Replace RSpec with crystalball in lefthook.yml:
   -      run: bundle exec rspec --fail-fast
   +      run: bundle exec crystalball --fail-fast

And from now every push will be accelerated dramatically if your changes are small.

But crystalball needs up-to-date code execution maps to work correctly. Can we automate these maps refreshing, too? Sure, we can!

Keeping Crystalball up-to-date

For that sake git's post-checkout hook fits very well. We can run code with updating crystalball data logic. “Logic” implies complexity as there is no single command for that. To cover such cases, Lefthook allows having separate executable script files. We can put our logic to .lefthook/post-checkout/crystalball-update file, make it executable, and declare in lefthook configuration like this:

# lefthook.yml

post-checkout:
  scripts:
    crystalball-update:
      tags: rspec backend

And there, in crystalball-update script, we need a bit of magic.

First of all, we don't need to do anything when a developer uses git checkout -- path command to reject some changes from working tree. Because this is not switching between commits and repository is possibly "dirty." Yes, Git CLI sometimes can feel weird as checkout command is used for two different tasks.

Per git docs, git will always pass to post-checkout hook three arguments: previous HEAD commit identifier (SHA1 sum), current HEAD commit identifier and flag whether it was checkout between branches (1) or file checkout to the state of another commit (0). Lefthook will catch these arguments and will carefully pass them to every managed script.

#!/usr/bin/env ruby

_prev_head, _curr_head, branch_change, * = ARGV

exit if branch_change == "0" # Don't run on file checkouts

Next, we want to update crystalball profiling data only on the master branch as recommended in the Crystalball docs. To do so, we need to ask git what branch we've checked out:

# Rails.root if we look from .lefthook/post-checkout dir
app_dir = File.expand_path("../..", __dir__)
ENV["BUNDLE_GEMFILE"] ||= File.join(app_dir, "Gemfile")
require "bundler/setup"

require "git"
exit unless Git.open(app_dir).current_branch == "master"

git gem is a dependency of Crystalball, so we don't have to install it.

And finally we need to do most heavy part: ask Crystalball, “Are your profiling data up-to-date?”

require "crystalball"
config = Crystalball::RSpec::Runner.config
prediction_builder = Crystalball::RSpec::Runner.prediction_builder

exit if File.exists?(config["execution_map_path"]) && !prediction_builder.expired_map?

And if it is not fresh we need to run the whole test suite with special environment variable set:

puts "Crystalball Ruby code execution maps are out of date. Performing full test suite to update them…"

ENV["CRYSTALBALL"] = "true"
RSpec::Core::Runner.run([app_dir])

And we're done. Are we?

Automate other routine tasks

But running specs require that we have:

a. installed gems and
b. actual database state.

And in actively developing application gems are frequently updated, added, and removed, database schema sometimes can be changed several times a day in different branches. It is so typical to pull fresh master at morning and get updated gems and new database migrations. In that case, RSpec would fail, and Crystalball execution path maps won't be complete. So we need to ensure that our specs always can run beforehand.

Install missing gems on a git checkout

This task is quite simple and can be achieved by a simple bash script. Most of it will consist of checks to avoid calling bundler when it's not needed. Bundler is quite heavy as it runs by noticeable time.

Two first of these checks are same, but just rewritten to shell: is this branch checkout? Did we actually move between commits?

#!/bin/bash

BRANCH_CHANGE=$3
[[ $BRANCH_CHANGE -eq 0 ]] && exit

PREV_HEAD=$1
CURR_HEAD=$2
[ $PREV_HEAD == $CURR_HEAD ] && exit

Next one is more tricky:

# Don't run bundler if there were no changes in gems
git diff --quiet --exit-code $PREV_HEAD $CURR_HEAD -- Gemfile.lock && exit;

We're asking here, “Does a set of required gems changed between commits?” If Gemfile.lock was changed, we need to check do we have all of the gems installed by invoking bundler.

bundle check || bundle install

Again, if you have up-to-date gems (and in most checkouts, it will be so), only bundle check will be executed.

Automatically rollback and apply database migrations

Next task is much more interesting.

When we're switching from branch A to branch B, we need to ensure that database schema actually is compatible with our specs. To do so, we must rollback every migration that exists in branch A but not in branch B and then to apply every migration that exists in B and still isn't applied. Rollback part is required because migrations that remove or rename columns and tables are not backward compatible.

The problem here is that there is no pre-checkout hook in git, only post-checkout one. And after checkout, there are no more migration files left that existed only in the branch we're switched from. How to rollback them?

But this is git! The files are out there. Why not just take them from git itself?

To do so programmatically let's use gem git to access our git repository. Crystalball already uses it under the hood so there will be no new dependency, but it is a good idea to add it to the Gemfile explicitly.

Let's start from the check that we really have any migrations to run (either up or down):

require "git"

# Rails.root if we look from .lefthook/post-checkout dir
app_dir = File.expand_path("../..", __dir__)

git = Git.open(app_dir)

# Don't run if there were no database changes between revisions
diff = git.diff(prev_head, curr_head).path("db/migrate")
exit if diff.size.zero?

Then, to be able to use migrations, we need to load our rails application and connect to the database:

require File.expand_path("config/boot", app_dir)
require File.expand_path("config/application", app_dir)
require "rake"
Rails.application.load_tasks

Rake::Task["db:load_config"].invoke

Then we can take files and save them somewhere:

# migrations added in prev_head (deleted in curr_head) that we need to rollback
rollback_migration_files = diff.select { |file| file.type == "deleted" }

if rollback_migration_files.any?
  require "tmpdir"
  MigrationFilenameRegexp = ActiveRecord::Migration::MigrationFilenameRegexp
  versions = []

  Dir.mktmpdir do |directory|
    rollback_migration_files.each do |diff_file|
      filename = File.basename(diff_file.path)
      contents = git.gblob("#{prev_head}:#{diff_file.path}").contents
      File.write(File.join(directory, filename), contents)
      version = filename.scan(MigrationFilenameRegexp).first&.first
      versions.push(version) if version
    end

And now, when we have files for migrations that need to be rolled back we can rollback them:

begin
  old_migration_paths = ActiveRecord::Migrator.migrations_paths
  ActiveRecord::Migrator.migrations_paths.push(directory)

  versions.sort.reverse_each do |version|
    ENV["VERSION"] = version
    Rake::Task["db:migrate:down"].execute
  end
ensure
  ENV.delete("VERSION")
  ActiveRecord::Migrator.migrations_paths = old_migration_paths
end

Here we're adding our temporary directory with another branch migrations to the ActiveRecord's migrations_paths. This setting is available since Rails 5, but not widely known. Now ActiveRecord can see our ghost migration files, and we can simply invoke rake db:migrate:down VERSION=number for every migration to rollback it.

And after that we can just migrate not yet applied migrations:

Rake::Task["db:migrate"].invoke

And that's it!

Composing it together

Now we only need to invoke these scripts in the right order: install gems, run migrations and run specs (if required). To do so we need to name files in alphabetical order, place them in .lefthook/post-checkout directory and declare them in lefthook.yml:

post-checkout:
  piped: true
  scripts:
    01-bundle-checkinstall:
      tags: backend
    02-db-migrate:
      tags: backend
    03-crystalball-update:
      tags: rspec backend

The piped option will abort the rest of the commands if the preceding command fails. For example, if you forget to launch a database server, the second step will fail, and lefthook will skip the third step altogether.

Frontend devs, not interested in running RSpec can exclude rspec tag in their lefthook.local.yml, and they will only get always installed gems and migrated database. Automagically.

Any gotchas?

From now on you always have to write reversible migrations. Look at example application to learn how to do it.

Conclusion

Now we not only have checks that will prevent us from pushing broken code to the repository (and will check this really fast) but also, as a side-effect, we always will have installed gems, and our database will be in migrated state. You will forget about bundle install (unless you are the one who update gems). And no more “Okay, now I need to rollback X and Y first, and checkout back to master only then.”

Experiment with example application published on GitHub: https://github.com/Envek/lefthook-crystalball-example

GitHub logo Envek / lefthook-crystalball-example

Learn how to make git hooks to do most routine tasks for you: install gems, migrate the database, run tests, and linters.

Example application for Lefthook + Crystalball guide

Please read “Lefthook, Crystalball, and git magic for smooth development experience” blog post first.

Sponsored by Evil Martians

Installation

  1. Ensure that you have Ruby 2.6.3 and SQLite installed (with development libraries)

  2. Fork and clone this repository locally:

    git clone https://github.com/[YOUR-NAME]/lefthook-crystalball-example
  3. Go to directory with it:

    cd lefthook-crystalball-example
  4. Install lefthook from Rubygems explicitly:

    gem install lefthook
  5. Install missing hooks to your local .git/hooks/ directory:

    lefthook install

    Every developer need to do this only once. After that Lefthook will sync hooks after changes in lefthook.yml automatically.

  6. Install other dependencies:

    bundle install
    yarn install --check-files
  7. Prepare database

    rails db:setup
  8. Generate code execution maps for Crystalball:

    CRYSTALBALL=true bundle exec rspec

    This need to be done only once too. post-checkout hook will run this automatically once a week.

Usage

  1. Checkout to feature/multiaccount branch, see how migrations are applied:

    git checkout feature/multiaccount
  2. Checkout to feature/variations branch, see how new gems…

Happy coding!

đź’– đź’Ş đź™… đźš©
envek
Andrey Novikov

Posted on July 10, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related