Analyzing and Editing Ruby Source Code in Bullet Train

Hello, this is Gabe from the Bullet Train Core team. I’ve been on the team since December 2020, and it’s been a blast facing a lot of different challenges that don’t show up in traditional Rails application development. I say that because, if you’re familiar with our service, Bullet Train provides powerful tooling like Super Scaffolding to develop Rails applications quickly (please read this article in the Bullet Train blog to find out more about Super Scaffolding). This means that there are a lot of moving parts under the hood like transforming and copying templates for Tailwind-powered views, nested models, controllers, and more. For example, say you want to create a new model called Project.

> rails generate model Project team:references title:string
> bin/super-scaffold crud Project Team title:text_field

By just running these two commands, Bullet Train will add a new controller, a model, and views with Tailwind styling which all belong to a Team and can perform CRUD actions. Super Scaffolding will also update your tests with new attributes to ensure everything is working properly, and it will even update your routes files so you can start using new your endpoints right away.

Because there’s so much automated generation that takes place, it comes with its own set of challenges. In this article, I’d like to talk specifically about…

A really difficult bug that I faced when working with the routes file scaffolding logic.
A new gem I’ve been making to increase the quality of our source code analyzation and replacement logic related to issues with that bug.

Brain Twister

This bug was one of the most challenging programming problems I’ve faced, yet it was also one of the most fun ones to work on. If I remember correctly, I had to sit for about an hour and just take notes before I looked at any code again just to organize my thoughts before attacking the problem. The problem might not be too difficult for some, but it sure was for me. Here's the original pull request for reference.

Basically, there was an issue where we were encountering namespace collisions when the same namespace existed within another part of the routes file. Take for instance the following model names.

> rails g model Insight team:references name:string
> rails g model Personality::CharacterTrait insight:references name:string
> rails g model Personality::Disposition team:references name:string

Personality::CharacterTrait belongs to Insight, so it should be namespaced under Insight. However, Personality::Disposition belongs to Team, so we should be creating a new namespace block entirely like so:

resources :teams
  resources :insights do
    namespace :personality do
      resources :character_traits
    end
  end

  namespace :personality do
    resources :disposition
  end
end

You can probably already see how messy this can get. When Super Scaffolding these models, we were getting something like this:

resources :teams
  resources :insights do
    namespace :personality do
      resources :character_traits
      resources :disposition
    end
  end

Just imagine trying to run integration tests on a whole set of these models with duplicated namespaces. My head was spinning!

To make a long story short, I ended up fixing this by taking each namespace and scoping it to its proper parent before actually trying to scaffold the proper routing.

Granted, the code works fine now, but it’s bugs like this that made me reconsider how we should be analyzing and editing our Ruby source code. For example, in the pull request I mentioned earlier, you can find this in the namespace_blocks_directly_under_parent method:

if lines[line_number].match?(/^#{" " * (parent_indentation_size + 2)}namespace/)

We use the standard gem for linting our Ruby code which takes away a lot of problems when trying to Super Scaffold new files, but you can probably already see why this code is problematic. In the regular expression here, we’re grabbing the indentation of the parent, and then just adding two spaces to it to find the namespace block that exists under it.

What if developers are using four spaces instead of two? What if for some reason they added an extra space? Even if they ran the correct command to Super Scaffold a new model, it would break and it would take too long to debug the problem for something that isn’t even their fault.

Also consider this line in the scope_to_namespace_parent method in the same pull request.

namespace_line_number if lines[namespace_line_number].match?(/ +namespace :#{namespace}/)

By the time we get to this line we’re already within the scope of the parent, so matching the namespace to this regular expression / +namespace :#{namespace}/ gets the job done for us. The problem is, using regular expressions on strings like this can grab so many other things. What if the namespace is :project and we had another namespace just called :project_sites already? The regular expression would match the wrong namespace and it would be a pain to debug.

Learning About Abstract Syntax Trees

And well, I hadn’t really been proactively looking for a solution to all of this, but it had been sitting in the back of my mind for a while because this isn’t the only place where we use match? or gsub to edit files like this. What actually led me to start working on this new gem was the book Ruby Under a Microscope. I learned about the abstract syntax tree data structure, Ruby’s LALR algorithm for parsing its source code, and some other cool things along the way. I talked to some Ruby commiters at Ruby Kaigi 2023 and they said Ripper is a semi-outdated library for looking at abstract syntax trees of your source code, and they suggested syntax_tree. Someone on the Bullet Train team also suggested looking into referral. They’re both awesome tools, but taking a quick look at them, I didn't quite find what I was looking for.

syntax_tree

For example, I really like this about syntax_tree. Imagine we have a file named add.rb with the following content.

2 + 2

Simple enough. By running the following CLI command, we get the following output about the nodes in the abstract syntax tree.

> stree expr add.rb
SyntaxTree::Binary[
  left: SyntaxTree::Int[value: "2"],
  operator: :+,
  right: SyntaxTree::Int[value: "2"]
]

Ripper doesn't provide this kind of information, and it can be really hard to read, so having this CLI command available is pretty great.

Also, the abstract syntax tree that Ripper.sexp generates looks like this:

irb(main):001:0> Ripper.sexp("2 + 2")
=> [:program, [[:binary, [:@int, "2", [1, 0]], :+, [:@int, "2", [1, 4]]]]]

syntax_tree’s AST output is much cleaner.

> stree ast add.rb
(program (statements ((binary (int "2") + (int "2")))))

This is great and all, and maybe I haven't delved deep enough into syntax_tree itself to find the tooling I was looking for, but I was finding that I wanted to analyze my Ruby source code in “real-time” so to speak and apply changes directly to the source code all within Bullet Train's code. Ripper also provided the line numbers I was looking for, which aren’t present in syntax_tree’s ast output. I wanted a library that I could require in Bullet Train to pinpoint each and every variable, symbol, or method invocation to transform on the spot and give back to the developers using Bullet Train.

Masamune

This led me to start making Masamune. Masamune takes the output from Ripper and translates it into a meaningful collection of nodes to pinpoint whatever data types or keywords we want to grab from our code. Take the following code snippet for example.

require "masamune"

code = <<CODE
java = "java"
javascript = java + "script"
puts java + " is not " + javascript
# java
CODE

msmn = Masamune::AbstractSyntaxTree.new(code)

msmn.variables
#=> [{:line_number=>1, :index_on_line=>0, :token=>"java"},
#=> {:line_number=>2, :index_on_line=>0, :token=>"javascript"},
#=> {:line_number=>2, :index_on_line=>13, :token=>"java"},
#=> {:line_number=>3, :index_on_line=>5, :token=>"java"},
#=> {:line_number=>3, :index_on_line=>25, :token=>"javascript"}]

msmn.strings
#=> [{:line_number=>1, :index_on_line=>8, :token=>"java"},
#=> {:line_number=>2, :index_on_line=>21, :token=>"script"},
#=> {:line_number=>3, :index_on_line=>13, :token=>" is not "}]

msmn.variables(name: "java")
#=> [{:line_number=>1, :index_on_line=>0, :token=>"java"},
#=> {:line_number=>2, :index_on_line=>13, :token=>"java"},
#=> {:line_number=>3, :index_on_line=>5, :token=>"java"}]

Knowing exactly what lines these tokens are on is extremely valuable information when trying to edit files in Bullet Train, and if we can isolate the tokens to the exact word we’re looking for, we won’t have to worry about regular expressions matching similar strings that we don’t want (like the :project and :project_sites example I mentioned above).

Since Bullet Train has been in development for a long time, I don’t foresee Masamune replacing all of this logic overnight. This also doesn’t account for the yaml and erb files which we also have to transform when using Super Scaffolding. However, I think this is a step in the right direction to replace code with more precision. I submitted a pull request to implement the new gem recently, and I’m excited to see how it will be used in the future. You can see here where Masamune pinpoints the lines each namespace its on by grabbing each namespace method call.

# `@msmn` here represents a Masamune::AbstractSyntaxTree
# object instantiated with the contents of config/routes.rb in a Rails app.
namespaces = @msmn.method_calls(name: "namespace")
namespace_line_numbers = namespaces.map { |namespace| namespace[:line_number] }

We then go through each line in the file, check if that line has a namespace invocation on it, and then retrieve the first symbol that comes right after the namespace.

if namespace_line_numbers.include?(line_index)
  namespace_name = @msmn.symbols.find { |sym| sym[:line_number] == line_index }[:token]
  # …
end

This puts my mind at ease when I think of trying to retrieve and edit tokens in our Bullet Train files. There’s still a lot of more work to do on Masamune and we’re only scratching the surface of how to use it in Bullet Train, but it’s challenges like this that make me enjoy programming, and I would be glad if this allows us to produce a better developer experience and help others by making our code generation tools more precise.

Give syntax_tree Another Chance

In Masamune, I’ve based all of my node classes off of the abstract syntax trees that Ripper.sexp generates. I think there’s a lot of good information that can be gleaned from the syntax_tree gem though, so I might give it another chance and base my node classes off of the output from there instead. Either way, there’s a lot of work that can be done here, and if you decide to check out Bullet Train, I hope you enjoy how helpful Super Scaffolding is!

Blog

Analyzing and Editing Ruby Source Code in Bullet Train

Gabriel Zayas

Brain Twister

Learning About Abstract Syntax Trees

syntax_tree

Masamune

Give syntax_tree Another Chance

Join Our Newsletter. No Spam, Only the good stuff.

Related