Runbook: A Ruby DSL for Gradual System Automation
Patrick Blesi
Posted on August 14, 2019
At Braintree, we like to write tools to automate our work. Our latest tool is Runbook, a Ruby DSL for gradually automating system operations.
I know what you’re thinking: Why build yet another tool to automate an engineer’s job? We already have bash scripts!
First, anyone who has tried writing a for-loop in bash will admit it's not intuitive (I have to look it up every time!). Second, even when scripting out solutions to common maintenance operations, there are often setup, teardown, and verification steps that are required to ensure the operation ran successfully. How many times have you run into issues forgetting to execute a setup or cleanup step that’s required for your maintenance script? How many times have you forgotten to verify that an operation has succeeded?
We can often mitigate these kinds of issues with good documentation. The problem with software documentation, as we know, is that it can become outdated over time if the maintainers neglect to update it.
How often have you scripted a maintenance operation only to have it become outdated and break six months later? Inevitably, you break out the editor and perform script surgery in an effort to recover from the failed state.
Runbook addresses these types of issues by providing a framework that tightly couples the documentation and code for an operation. It also allows you to progressively automate your operations, finding the right balance between full automation and human involvement.
The philosophy of Runbook is heavily aligned with Dan Slimmon's Do-nothing scripting and Atul Gawande's The Checklist Manifesto. It is designed to minimize Toil.
Runbook is not intended to replace more special-purpose automation solutions such as configuration management solutions (Puppet, Chef, Ansible, Salt), deployment solutions (Capistrano, Kubernetes, Docker Swarm), monitoring solutions (Nagios, Datadog), or local command execution (Rake tasks, Make). Instead Runbook is best used as a glue when needing to accomplish a task that cuts across these domains.
A simple runbook
A runbook outlines a list of steps required to perform an operation.
# restart_nginx.rb
Runbook.book "Restart Nginx" do
description <<-DESC
This is a simple runbook to restart nginx
DESC
section "Restart Nginx" do
step "Stop Nginx"
step "Wait for requests to drain"
step "Start Nginx"
end
end
It can be compiled and used to generate a Markdown checklist or be interactively executed.
# Restart Nginx
This is a simple runbook to restart nginx
## 1. Restart Nginx
1. [] Stop Nginx
2. [] Wait for requests to drain
3. [] Start Nginx
Adding automation
Moving past this initial outline, one can start to build automation into their runbook.
# restart_nginx.rb
Runbook.book "Restart Nginx" do
description <<-DESC
This is a simple runbook to restart nginx and
verify it starts successfully
DESC
section "Restart Nginx" do
server "app01.prod"
user "root"
step "Stop Nginx" do
note "Stopping Nginx..."
command "service nginx stop"
assert %q{service nginx status | grep "not running"}
end
step { wait 30 }
step "Start Nginx" do
note "Starting Nginx..."
command "service nginx start"
assert %q{service nginx status | grep "is running"}
confirm "Nginx is taking traffic?"
notice "Make sure to report why you restarted nginx"
end
end
end
Notice that this runbook includes the step confirm "Nginx is taking traffic?"
. You can easily put off scripting steps that are more difficult to automate by delegating that step to the person executing the runbook.
Features
Some of Runbook's features include:
SSH integration
Runbook integrates with SSH using SSHKit to provide support for executing commands on remote servers, downloading and uploading files, and capturing output from remotely executed commands. You can control the parallelization strategy for execution, executing in parallel, serially, or in groups.
Runbook.book "Restart Nginx" do
section "Restart Services" do
servers (0..50).map { |n| "app#{n.to_s.rjust(2, "0")}.prod" }
parallelization(strategy: :groups, limit: 5, wait: 2)
step "Restart services" do
command "service nginx restart"
end
end
end
The above example executes service nginx restart
across app{01..50}.prod
on five servers at a time, waiting 2 seconds between each execution.
Dynamic control flow
We designed Runbook's control flow to be dynamic; at any point you can skip steps, jump to any step (even a previous step), or exit.
Runbook saves its state between each step of the runbook, and it can restart from where it left off if an error occurs while executing the runbook. In fact, you can resume a stopped runbook at any point in its execution.
Noop and auto modes
Runbook provides both a noop and an auto mode. Noop mode allows you to verify the operations your runbook will run before you execute it. Auto mode will execute your runbook, requiring no human interaction. Any prompts you have added to your runbook will use the provided default values, or the execution will immediately fail if prompts exist without defaults.
Execution lifecycle hooks
Runbook provides support for before, around, and after execution hooks. You can alter and augment your runbook behavior by hooking into the execution of entities and statements in your runbook. Hooks can be used to provide a rich set of behavior such as timing the execution of steps of a runbook or the runbook as a whole, tracking the frequency of execution of a runbook, and notifying Slack when a runbook has completed.
Runbook::Runs::SSHKit.register_hook(
:notify_slack_of_execution_time,
:around,
Runbook::Entities::Book
) do |object, metadata, block|
start = Time.now
block.call(object, metadata)
duration = Time.now - start
unless metadata[:noop]
message = "Runbook #{object.title}: took #{duration} seconds to execute!"
notify_slack(message)
end
end
First-class tmux support
At Braintree we live on a steady diet of vim and tmux. Consequently, Runbook provides first-class support for executing commands within a tmux. When specifying your runbook, you can define a tmux layout. This flexible and intuitive interface allows you to send commands to panes by name.
Executing commands in separate panes is ideal for monitoring, commands that require user interaction, or commands that are prone to failure. You can then interact with the command directly, troubleshooting and resolving issues before continuing the runbook.
Runbook.book "Restart Nginx" do
layout [[
[{name: :top_left, runbook_pane: true}, :top_right],
:middle,
{name: :bottom, directory: "/var/log", command: "tail -Fn 100 nginx.log"},
]]
section "Setup monitoring" do
step do
tmux_command "watch 'service nginx status'", :top_right
tmux_command "vim /etc/nginx/nginx.conf", :middle
end
end
end
Runbooks remember their tmux layouts between executions. If a runbook stops unexpectedly, it will connect to the existing tmux layout when resumed as long as the tmux panes have not been altered. Additionally, runbooks offer to automatically close their tmux panes when the runbook finishes executing.
Ruby commands
Runbook provides a ruby_command
statement to dynamically define runbook statements and their arguments. You can, for example, hit a JSON endpoint to retrieve a list of servers and then execute a command on each of those servers. Because you are working in Ruby, you have access to all the parsing and processing capabilities it provides.
require 'json'
Runbook.book "Restart Old Services" do
section "Restart week-old services" do
step do
server "monitor01.prod"
capture "curl localhost:9200/host_ages.json", into: :host_ages
ruby_command do |rb_cmd, metadata|
one_week_ago = 1.week.ago
old_hosts = JSON.parse(host_ages).select { |host| host["started"] < one_week_ago }
old_host_names = old_hosts.map { |host| host["name"] }
old_host_names.each do |name|
command "shutdown -r now", ssh_config: {servers: [name], user: "root"}
end
end
end
end
end
Generators
Runbook provides generators, similar to Rails, for generating runbooks, runbook extensions, and runbook-focused projects. You can even define your own generators for including team-specific customizations in your generated runbooks.
Adaptability
Runbook is designed to seamlessly integrate into existing infrastructure. It can be used as a Ruby library, a command line tool, or to create self-executable runbooks. Runbook adheres to universal interfaces such as the command line and ssh. Runbooks can be invoked via cron jobs and integrated into docker containers.
Further, Runbook is extensible so you can augment the DSL with your own statements and functionality. The below example aliases section
to s
in the Book
DSL.
module MyRunbook::Extensions
module Aliases
module DSL
def s(title, &block)
section(title, &block)
end
end
end
Runbook::Entities::Book::DSL.prepend(Aliases::DSL)
end
This flexibility allows you to adapt Runbook to meet any use case you encounter.
Check it out
At Braintree, we use Runbook for automating our app deployment preflight checklists, on-call playbooks, system maintenance operations, SDK deployments, and more. We've found it to be instrumental in streamlining production operations, reducing human error, and increasing overall quality of life.
Check out Runbook on Github for more information on how you can use Runbook to streamline production operations and increase developer happiness!
This post was originally published on medium.
Posted on August 14, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.