ASTs in Ruby - Node Pattern and Introducing RuboCop

baweaver

Brandon Weaver

Posted on June 15, 2022

ASTs in Ruby - Node Pattern and Introducing RuboCop

In the last article I was not being quite honest when I said that regex would not work when trying to manipulate Ruby files. Strictly speaking, no, regex is not a good idea for working with tree-like structures and tokens but there is something there we need to take a look at.

Regexp stands for Regular Expressions, or in the way I tend to define it a set of expressions that can describe the shape of text in such a way you can match against anything with a similar shape. We could be pedantic and go into regular languages, but let's instead focus on this part:

describe the shape of text

Who's to say we couldn't have an expression language of some sort that would match against other shapes, like say... ASTs? That, my friends, is where NodePattern comes in and is the start of what we'll be covering in this article.

Tools of the Trade

To start out with though, you'll want a few methods and helpers to work with here:

require "rubocop"

# Turn a string into a RuboCop AST (based on WhiteQuark's AST)
def ast_from(string)
  RuboCop::ProcessedSource.new(string, RUBY_VERSION.to_f).ast
end

# Useful for debugging and seeing how the nodes deconstruct
def deep_deconstruct(node)
  return node unless node.respond_to?(:deconstruct)

  node.deconstruct.map { deep_deconstruct(_1) }
end

# Create a NodePattern from a string
def node_pattern_from(string)
  RuboCop::NodePattern.new(string)
end
Enter fullscreen mode Exit fullscreen mode

With these three we'll have most of what we need to get started working with NodePatterns. Eventually I may wrap these in a gem later, and if I do I'll update this article to reflect that.

Naturally the best source of information is the official documentation for NodePattern, and you'll find it to be quite comprehensive.

Oh, and that regex bit? Remember how we have Rubular for regex? We have the NodePattern Debugger for ASTs which you will find incredibly helpful, in fact you might open it now and try out some of these examples to make sure I'm not pulling a Fast one. (No, I'm not apologizing for that one.)

Quickstart on NodePattern

Before we get too deep, let's start with a basic example of NodePattern and what that might look like with this code example:

code = "a = 1 + 1"
ast_from(code)

# returns

s(:lvasgn, :a,
  s(:send,
    s(:int, 1), :+,
    s(:int, 1)))
Enter fullscreen mode Exit fullscreen mode

After reading the previous article on pattern matching this is going to look very familiar, but an exact match in NodePattern might look something like this:

node_pattern = node_pattern_from <<~PATTERN
  (lvasgn :a
    (send (int 1) :+ (int 1)))
PATTERN

node_pattern.match(ast_from("a = 1 + 1"))
# => true
Enter fullscreen mode Exit fullscreen mode

If we were to use pattern matching for this it might instead look like this:

ast_from("a = 1 + 1") in [:lvasgn, :a,
  [:send, [:int, 1], :+, [:int, 1]]
]
# true
Enter fullscreen mode Exit fullscreen mode

We could certainly say the NodePattern is more succinct, and you might notice that you can omit the Symbols for keywords in Ruby versus things like values, variables, and method names (+ is a method, fun fact.)

But we said regexen, which means that's the start of what NodePattern can do. Let's say we wanted to make sure that both of those values being added were the same, like with pinning in pattern matching:

node_pattern = node_pattern_from <<~PATTERN
  (lvasgn :a
    (send _value :+ _value))
PATTERN

node_pattern.match(ast_from("a = 1 + 1"))
# => true
Enter fullscreen mode Exit fullscreen mode

So unlike pinning similar underscore variables have to be the same, without requiring ^ like in pattern matching:

ast_from("a = 1 + 1") in [:lvasgn, :a, [:send, v, :+, ^v]]
# => true
Enter fullscreen mode Exit fullscreen mode

Ah, and before we get too much into it? You can use both of them somewhat interchangeably though NodePattern does have a slight edge in power but readability tends to be a preference thing. ASTs are hard in general to work with, especially for larger tasks, so I'm not going to say easy.

Back to Shorthand

Let's take a look back at the code examples from the previous article:

[1, 2, 3].select { |v| v.even? } # What we expect
[1, 2, 3].map { |v| v.to_s(2) } # An argument
[1, 2, 3].select { |v| v.next.even? } # Many methods
[1, 2, 3].map { |v| v.next + 1 } # Perhaps both
x = 4; [1, 2, 3].select { |v| x.even? } # Why though?
Enter fullscreen mode Exit fullscreen mode

...or, more specifically, let's focus back in on the first one for now:

code = "[1, 2, 3].select { |v| v.even? }"
ast = ast_from(code)

deep_deconstruct(ast)
# => [:block,
#      [:send, [:array, [:int, 1], [:int, 2], [:int, 3]], :select],
#      [[:arg, :v]], [:send, [:lvar, :v], :even?]]
Enter fullscreen mode Exit fullscreen mode

You'll see in a moment, much like with the pattern matching variants, why the Array representation can make this much easier to reason about. In fact, let's recall how we solved this with pattern matching before:

def shorthandable?(ast)
  ast in [:block, _,
    [[:arg, a]], [:send, [:lvar, ^a], _]
  ]
end
Enter fullscreen mode Exit fullscreen mode

If we were to convert that to NodePattern it might look like this instead:

SHORTHAND_PATTERN = node_pattern_from <<~PATTERN
  (block $_receiver
    (args (arg _a)) (send (lvar _a) $_method_name))
PATTERN

def shorthandable_np?(ast)
  !!SHORTHAND_PATTERN.match(ast)
end

shorthandable_np?(ast_from("[1, 2, 3].select { |v| v.even? }"))
# => true

SHORTHAND_PATTERN.match(ast_from("[1, 2, 3].select { |v| v.even? }"))

# returns

[s(:send,
  s(:array,
    s(:int, 1),
    s(:int, 2),
    s(:int, 3)), :select), :even?]
Enter fullscreen mode Exit fullscreen mode

One insidious little trick you might notice is that args is conspicuously absent from the deconstructed Array representation, so you do need to be careful with some edges on interpretations.

The other interesting thing here is the $, which is NodePattern for "capture". While you could certainly capture with $_ I prefer to name those captures so I know what they were. Out the other side of match you'll notice that both of those AST nodes were returned as well, which were the receiver and the method name.

That's More Code Though?

You are absolutely correct there my clever clever reader, that it is. NodePattern on its own isn't quite as powerful as it might be when it's used in conjunction with RuboCop. Granted sometimes for testing I'll still use something very similar to the above, but the true power starts showing up here in the next section.

What About the Rest of NodePattern?

The documentation will go into much more comprehensive detail, but we've barely looked into some of the power of NodePattern and what's possible with it. Given that, we still took a look at some of the most commonly used parts of it, which serves as a decent introduction for the moment.

Introducing RuboCop

So where are we going with this? We've shown how to match against and rewrite code, sure, but how do we actually apply that? That's where RuboCop comes in. The unfortunate part is that so many only think of RuboCop as a tool to nag you on stylistic items, but it's far more than that.

RuboCop is also a set of tools for identifying and potentially even replacing code that matches a certain pattern using custom cops. There's even an entire section of the docs which covers this in Development.

You'll notice much of what we've covered so far is going to get you pretty far down that page, but let's take a detour into our shorthand syntax again and focus on that.

Custom Shorthand Cop

A RuboCop cop will look something like this:

module RuboCop
  module Cop
    module Style # Namespace
      class UseShorthandBlockSyntax < Base # Our name
        # If we only want to see if something matches, excluding captures
        def_node_matcher :shorthand?, <<~PATTERN
          (block _receiver
            (args (arg _a)) (send (lvar _a) _method_name))
        PATTERN

        # If we still want those captures, we'll get to this next article
        SHORTHAND_PATTERN = RuboCop::NodePattern.new <<~PATTERN
          (block $_receiver
            (args (arg _a)) (send (lvar _a) $_method_name))
        PATTERN

        # On any block node RuboCop encounters, send it to this method
        def on_block(node)
          # Using that helper method above, it it's not shorthand bail out
          return false unless shorthand?(node)

          # If it is, mark the node as an offense
          add_offense(node)
        end
      end
    end
  end
end
Enter fullscreen mode Exit fullscreen mode

The items which will look particularly strange to you are def_node_matcher, on_block, and add_offense.

Node Matcher

def_node_matcher creates a predicate method from a NodePattern which we called with return false unless shorthand?(node) and takes care of all the match handling for us. If we used a constant we'd still have to manually create the NodePattern from a String and call match on it later.

On Methods

on_block is interesting in that any block node that RuboCop catches will be sent to any on_block method to check if it's a match. The on_ methods exist for every type of block, and are typically the entry point to your checks. Frequently this will be on_send instead.

Add Offense

This is where we tell RuboCop that the node is a match for our rule, and that we want to flag it for reporting. Interestingly there are more options here around formatting the violation message, choosing where the violation occurred (entire node, part of it, where?), and a few more options.

Testing

Now what makes this really powerful is the suite of testing tools to verify that various types of code are or are not matches for your new rule, and when we get to automatic correction? You can even test that it happened to replace it correctly.

You can see why this might be real handy indeed.

Ah, and RSpec has an inline outrun mode, so let's sneak that into our script right quick:

require "rubocop"
require "rspec/autorun"
require "rubocop/rspec/support"

module RuboCop
  module Cop
    module Style
      class UseShorthandBlockSyntax < Base
        def_node_matcher :shorthand?, <<~PATTERN
          (block _receiver
            (args (arg _a)) (send (lvar _a) _method_name))
        PATTERN

        # Make a custom message, if you want. `add_offense` also accepts one
        MSG = "BAD!"

        def on_block(node)
          return false unless shorthand?(node)

          add_offense(node)
        end
      end
    end
  end
end

# Quick config - This gives us the helpers like `expect_offense`
RSpec.configure do |config|
  config.include RuboCop::RSpec::ExpectOffense
end

# Make SURE to include the `:config` here, or `cop` will be undefined on run
RSpec.describe RuboCop::Cop::Style::UseShorthandBlockSyntax, :config do
  it "catches a match" do
    # Specify which code meets the rule, and what the error should look like
    expect_offense(<<~RUBY)
      [1, 2, 3].select { |v| v.even? }
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ BAD!
    RUBY
  end

  it "does not catch a non-match" do
    # Or that there are no offenses from something
    expect_no_offenses(<<~RUBY)
      [1, 2, 3].map { |v| v.to_s(2) }
    RUBY
  end
end
Enter fullscreen mode Exit fullscreen mode

Granted you should probably nest that in your spec folder somewhere, but you can see how that could become very useful for quick tests.

The major point here is that you can test against multiple variations of code you might expect, and define whether or not they should match your rule, and that's exceptionally powerful in large code bases.

Now that all said, we'll save the really fun stuff for next time.

Wrap Up

The point of this article was to introduce you to NodePattern, some of the tooling around it, and how that might look when integrated into RuboCop. It's very much meant as an introduction rather than a de-facto guide. The documentation will do a better job of that, but it's nice to know these things exist.

Next time, however, we're going to take a look into the single most powerful secret that very few know of:

Autocorrection.

If we're just talking lint and style sure, that's interesting but not highly useful beyond prettying things up. No no, we're talking something far more incredible, and that would be code migrations. If you have, say, a giant monorepo and you want to migrate a pattern of code everywhere with the added benefit of testability you'll be real fond of the next piece in this series.

💖 💪 🙅 🚩
baweaver
Brandon Weaver

Posted on June 15, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related