How did we upgrade and containerize our largest Rails monolith in one quarter?
Cristian
Posted on November 13, 2019
Early days
In the early days, we built Aircall with a single Ruby on Rails app. Using Rails was great because we were able to bootstrap the development process. Besides, it helped us focus on reiterating until we landed upon a market fit. Though, as we grew, it was clear this app was becoming too big, so we split it into four Ruby on Rails apps.
The first of the four apps contained most of the product, such as user accounts, number settings, and both the internal/public API. The second contained all logic used to connect Aircall to custom integrations such as Salesforce and Zendesk. The third handled all real-time call logic, such as routing and conferences. Lastly, the fourth handled all billing related tasks such as processing payments and generating invoices.
While this helped our engineering team manage different parts of the product in a more organized manner, the main app containing most of the product continued to grow. Also, we were using AWS Elastic Beanstalk to host these applications since it allowed us to build our infrastructure and serve our customers quickly but would later prove to be a major pain point in our CI pipeline.
Now
Fast forward a few short years, we now have a rapidly growing engineering team with an elaborate fleet of containerized microservices. That being said, at the heart of it all is still our very first Ruby on Rails app. Granted, the responsibility of this app has significantly diminished over the years, but it still serves an essential role within our company being the single source of truth for most of our data.
During Q2 of 2019, we decided to show a little love to this app, which for the rest of this post, we'll call Web. Web was running Rails 4.2 on Elastic Beanstalk. Rails 6 was on the horizon, and many of our other applications were already running in Docker containers on ECS, so it was logical to upgrade Web to a newer Rails version and containerize it. Though this process is much easier said than done, we managed to achieve this in just a single quarter.
Upgrading Rails from 4.2 to 5.2
It was the second to last day of Rails Conf 2019 where I was sitting in on a talk given by Eileen M. Uchitelle, a Software Engineer at GitHub and a member of the Rails Core Team. Her talk highlighted how GitHub managed to upgrade both Ruby and Rails. If they managed to upgrade from Rails 2.3 to 5.2 and ditch their custom fork of Ruby while maintaining uptime, why couldn't we do the same? I mean, we were only going from Rails 4.2 to 5.2, so this should be a piece of cake. I was wrong. Here's what we learned and how we did it.
We decided early on that those who managed the Rails upgrade would continuously merge master into the feat/upgrade-rails branch and retroactively fix any deprecation issues as we progressed. This process left us with our master branch running Rails 4.2 and a feat/upgrade-rails branch running Rails 5.2. Next came the fun part, fixing all deprecation warnings, and since we went straight to Rails 5.2, they were mostly exceptions instead of warnings.
It's worth it to note that it isn't recommended to skip versions and is suggested to first begin with 5.0, then 5.1, and finally 5.2. Ultimately, we decided on jumping straight to 5.2 because it would help us smoke test problems quicker and saved us from having to scrub test logs for deprecation warnings.
There are many changes from Rails 4.2 to 5.2, some breaking and some not. Below are the ones we think are the most important and the steps needed to upgrade
Ruby
Rails 5 requires a minimum of Ruby 2.2.2 or later. If you're using rvm
, rbenv
, or asdf
, this should be as simple as running the following commands from your shell.
rvm install 2.2.2
rbenv install 2.2.2
asdf install ruby 2.2.2
Dependencies
Dependencies in your Gemfile might not be compatible with newer versions of Rails. I suggest locking your Rails version in your Gemfile on a newer version: gem 'rails', '~> 5.2'
. Then run bundle update rails
while incrementally bumping conflicting gems until you get a successful bundle.
Configuration
After you’ve achieved a successful bundle, run rails app:update
to interactively upgrade configuration files in bin/
and /config
.
Schema
Unique and foreign indexes now live inside the create_table
method. Running rails db:migrate
generates a new db/schema.rb
file with these changes regardless of whether or not you have any pending migrations.
Migrations
ActiveRecord
migration files need to be tagged with the version of Rails in which they were generated. CreateCalls < ActiveRecord::Migration
now becomes CreateCalls < ActiveRecord::Migration[4.2]
.
class CreateCalls < ActiveRecord::Migration[4.2]
def change
end
end
ActionController
Strong parameters have replaced protected attributes. If you’ve been using strong parameters already, then this shouldn’t cause problems. It’s important to note ActionController::Parameters
now returns an Object
instead of a Hash
.
To access the raw parameters as a Hash
, you can add #to_h
to the Object
: params.to_h
.
params.permit(:page, :per_page).to_h.reverse_merge(page: 1, per_page: 15)
ActiveRecord
ActiveRecord
models now inherit from ApplicationRecord
instead of ActiveRecord::Base
.
# app/models/application_record.rb
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
end
# app/models/call.rb
class Call < ApplicationRecord
belongs_to :number
end
belongs_to
associations are required by default, and the use of required: true
in belongs_to :company, required: true
has now been replaced by optional: false
.
# app/models/call.rb
class Call < ApplicationRecord
belongs_to :number, optional: false
end
ActiveRecord
and ActiveModel
callbacks no longer halt when returning false
. You need to use throw(:abort)
instead.
class Number < ApplicationRecord
before_create do
throw(:abort) if you_need_to_stop_creation
end
end
Tests
ActionController::TestCase
now only accepts keyword arguments. Inside your specs, get :show, id: 1
becomes get :show, params: { id: 1 }
.
What didn’t work?
Tests
As mentioned, Web has been around for a while, and many hands have touched it over the years. In any codebase, things could get out of hand, and we knew some parts might not have extensive tests in place. This lack of tests caused some issues because once we got the existing RSpec suite green, we still knew some things might not work as a result of the upgrade.
At this point, we had two solutions. First, manually testing the product and trying to reproduce any edge cases that came to mind. Second, deploying the upgraded app to only a percentage of our production traffic and acting fast to patch issues. Smoke testing this way was not an ideal situation, but it worked for us and kept our uptime at 99.99%.
To prevent this from happening in the future, we’ve stressed importance on code coverage within our engineering team. Simplecov is excellent for this and is now a part of our CI pipeline, calculating test coverage for every branch of Web, seamlessly. I’ll be releasing a blog post in the coming weeks showing how we aggregate code coverage across multiple CPU cores and containers on CircleCI to produce a single report.
Communication
Besides testing, communication is essential when executing a migration of this scale. To sum it up, communication was good, not great. We’re a multinational engineering team with 10 engineers in NYC and 50 in Paris. In retrospect, a project of this scale is better suited for the team with the largest amount of engineers, but the NYC team handled it instead, given the availability of resources at the time.
Failure of communication became apparent as the Paris team would leave for the day while the NYC team would arrive and work vigorously on the upgrade. Shortly after that, as the NYC team went home, the Paris team would arrive back the next day and occasionally be faced with an unstable staging environment. Since Web is at the core of our ecosystem, an unstable environment hindered the development of some projects. In most cases, the unstable portions were those in which there were no tests. You could say this was beneficial because it allowed us to spot issues before they reached our customers. At the same time, the Paris team was not familiar with the procedures or status of the upgrade and often had to troubleshoot in the dark until the NYC team arrived. To make matters worse, in some cases, the Paris team didn’t know if an issue was related to their project or the upgrade of Web.
This upgrade quickly pointed out an area in which we needed to improve on as a team. Additionally, we think when undertaking an upgrade of this scale, you need to make sure all engineering teams are looped in on every aspect, regardless of location or perceived interactions.
Conclusion
All in all, this project was a success. Our legacy monolith is now ready to run another few years while we continue to grow. We learned a lot and continue to apply our findings to every new project here at Aircall. This upgrade wasn’t easy, but it was worth it. Now we can better serve our customers by releasing new features and bug fixes quicker than ever, thanks to our improved CI pipeline.
Posted on November 13, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 13, 2019