Changing the Approach to Debugging in Ruby with TracePoint

st0012

Stan Lo

Posted on April 9, 2020

Changing the Approach to Debugging in Ruby with TracePoint

Ruby has always been known for the productivity it brings to its developers. Alongside features such as elegant syntax, rich meta-programming support, etc. that make you productive when writing code, it also has another secret weapon called TracePoint that can help you "debug" faster.

In this post, I'll use a simple example to show you 2 interesting facts I found out about debugging:

  1. Most of the time, finding the bug itself isn't hard, but understanding how your program works in detail is. Once you have a deep understanding of this, you can usually spot the bug right away.
  2. Observing your program down to the method call level is time-consuming, and is the major bottleneck of our debugging process.

Then, I'll show you how TracePoint could change the way we approach debugging by making the program "tell us" what it's doing.

Debugging Is about Understanding Your Program and Its Design

Let's assume we have a Ruby program called plus_1 and it's not functioning correctly. How do we debug this?

# plus_1.rb
def plus_1(n)
  n + 2
end

input = ARGV[0].to_i
puts(plus_1(input))
Enter fullscreen mode Exit fullscreen mode
$ ruby plus_1.rb 1
3
Enter fullscreen mode Exit fullscreen mode

Ideally, we should be able to address the bug in 3 steps:

  1. Learn the expectations from the design
  2. Understand the current implementation
  3. Trace the bug

Learning the Expectations from the Design

What's the expected behavior here? plus_1 should add 1 to its argument, which is our input from the command line. But how do we "know" this?

In a real-world case, we can understand the expectations by reading test cases, documents, mockups, asking other people for feedback, etc. Our understanding depends on how the program is "designed".

This step is the most crucial part of our debugging process. If you don't understand how the program should work, you'll never be able to debug it.

However, there are many factors that can be part of this step, such as team coordination, development workflow, etc. TracePoint won't be able to help you with those, so we won't dwell on these problems today.

Understanding the Current Implementation

Once we've understood the expected behavior of the program, we need to learn how it functions at the moment.

In most cases, we need the following information to fully understand how a program works:

  • The methods called during the program's execution
  • The call and return order of the method calls
  • Arguments passed to each method call
  • Values returned from each method call
  • Any side effects that happened during each method call, e.g. data mutation or database requests

Let's describe our example with the above information:

# plus_1.rb
def plus_1(n)
  n + 2
end

input = ARGV[0].to_i
puts(plus_1(input))
Enter fullscreen mode Exit fullscreen mode
$ ruby plus_1.rb 1
3
Enter fullscreen mode Exit fullscreen mode
  1. Defines a method called plus_1
  2. Retrieves the input ("1") from ARGV
  3. Calls to_i on "1", which returns 1
  4. Assigns 1 to local variable input
  5. Calls plus_1 method with input(1) as its argument. The parameter n now carries a value of 1
  6. Calls + method on 1 with an argument 2, and returns the result 3
  7. Returns 3 for step 5
  8. Calls puts
  9. Calls to_s on 3, which returns "3"
  10. Passes "3" to the puts call from step 8, which triggers a side effect that prints the string to Stdout. Then it returns nil.

The description isn't 100% accurate, but it's enough for a simple explanation.

Addressing the Bug

Now that we've learned how our program should work and how it actually works, we can start looking for the bug. With the information we have, we can search for the bug by following the method calls upward (start from step 10) or downward (start from step 1). In this case, we can do it by tracing back to the method that returned 3 in the first place⁠—which is the 1 + 2 in step 6.

This Is Far from Reality!

Of course, we all know that real debugging isn't as simple as the example makes it out to be. The critical difference between real-life programs and our example is the size. We used 10 steps to explain a 5-line program. How many steps would we need for a small Rails app? It's basically impossible to break down a real program as detailed as we did for the example.
Without a detailed understanding of your program, you won't be able to track down the bug through an obvious path, so you'll need to make assumptions or guesses.

Information Is Expensive

As you probably already noticed, the key factor in debugging is how much information you have. But what does it take to retrieve that much information? Let's see:

# plus_1_with_tracing.rb
def plus_1(n)
  puts("n = #{n}")
  n + 2
end

raw_input = ARGV[0]
puts("raw_input: #{raw_input}")
input = raw_input.to_i
puts("input: #{input}")

result = plus_1(input)
puts("result of plus_1 #{result}")

puts(result)
Enter fullscreen mode Exit fullscreen mode
$ ruby plus_1_with_tracing.rb 1
raw_input: 1
input: 1
n = 1
result of plus_1: 3
3
Enter fullscreen mode Exit fullscreen mode

As you can see, we only get 2 types of information here: values of some variables and the evaluation order of our puts (which implies the program's execution order).

How much does this information cost us?

 def plus_1(n)
+  puts("n = #{n}")
   n + 2
 end

-input = ARGV[0].to_i
-puts(plus_1(input))
+raw_input = ARGV[0]
+puts("raw_input: #{raw_input}")
+input = raw_input.to_i
+puts("input: #{input}")
+
+result = plus_1(input)
+puts("result of plus_1: #{result}")
+
+puts(result)
Enter fullscreen mode Exit fullscreen mode

Not only do we need to add 4 puts into the code, but, in order to print values separately, we also need to split our logic in order to access the intermediate states of some values. In this case, we got 4 additional outputs for the internal states with 8 lines of changes. That's 2 lines of changes for 1 line of output, on average! And since the number of changes grows linearly with the size of the program, we can compare it to an O(n) operation.

Why Is Debugging Expensive?

Our programs can be written with many goals in mind: maintainability, performance, simplicity, etc. but usually not for "Traceability", meaning, getting the values for inspection, which usually requires a modification of the code, e.g. splitting chained method calls.

  • The more the information you get, the more additions/changes you need to make to the code.

However, once the amount of information you get reaches a certain point, you won't be able to process it efficiently. So we either need to filter the information out or label it to help us understand it.

  • The more precise the information, the more additions/changes you need to make to the code.

Finally, because the work involves touching the codebase⁠—which can be very different between bugs (e.g. controller vs. model logic)⁠—it's hard to automate it. Even if your codebase is tracing-friendly (e.g. it follows "Law of Demeter" strictly), most of the time, you'll need to type different variable/method names manually.

(Actually, in Ruby, there are some tricks to avoid this⁠—like __method__. But let's not complicate things here.)

TracePoint: The Savior

However, Ruby provides us an exceptional tool that can largely reduce the cost: TracePoint. I bet most of you have already heard of it or used it before. But in my experience, not many people use this powerful tool in daily debugging practices.

Let me show you how to use it to collect information quickly. This time, we don't need to touch any of our existing logic, we just need some code before it:

TracePoint.trace(:call, :return, :c_call, :c_return) do |tp|
  event = tp.event.to_s.sub(/(.+(call|return))/, '\2').rjust(6, " ")
  message = "#{event} of #{tp.defined_class}##{tp.callee_id} on #{tp.self.inspect}"
  # if you call `return` on any non-return events, it'll raise error
  message += " => #{tp.return_value.inspect}" if tp.event == :return || tp.event == :c_return
  puts(message)
end

def plus_1(n)
  n + 2
end

input = ARGV[0].to_i
puts(plus_1(input))
Enter fullscreen mode Exit fullscreen mode

If you run the code, you'll see:

return of #<Class:TracePoint>#trace on TracePoint => #<TracePoint:c_return `trace'@plus_1_with_trace_point.rb:1>
  call of Module#method_added on Object
return of Module#method_added on Object => nil
  call of String#to_i on "1"
return of String#to_i on "1" => 1
  call of Object#plus_1 on main
return of Object#plus_1 on main => 3
  call of Kernel#puts on main
  call of IO#puts on #<IO:<STDOUT>>
  call of Integer#to_s on 3
return of Integer#to_s on 3 => "3"
  call of IO#write on #<IO:<STDOUT>>
3
return of IO#write on #<IO:<STDOUT>> => 2
return of IO#puts on #<IO:<STDOUT>> => nil
return of Kernel#puts on main => nil
Enter fullscreen mode Exit fullscreen mode

Our code is much more readable now. Isn't it amazing? It prints out most of the program execution with lots of details! We can even map it with my earlier execution breakdown:

  1. Defines a method called plus_1
  2. Retrieves the input ("1") from ARGV
  3. Calls to_i on "1", which returns 1
  4. Assigns 1 to local variable input
  5. Calls plus_1 method with input(1) as its argument. The parameter n now carries a value 1
  6. Calls + method on 1 with an argument 2, and returns the result 3
  7. Returns 3 for step 5
  8. Calls puts
  9. Calls to_s on 3, which returns "3"
  10. Passes "3" to the puts call from step 8, which triggers a side effect that prints the string to Stdout. And then it returns nil.
# ignore this, it's TracePoint tracing itself ;D
return of #<Class:TracePoint>#trace on TracePoint => #<TracePoint:c_return `trace'@plus_1_with_trace_point.rb:1>

  call of Module#method_added on Object         # 1. Defines a method called `plus_1`.
return of Module#method_added on Object => nil
  call of String#to_i on "1"                    # 3-1. Calls `to_i` on `"1"`
return of String#to_i on "1" => 1               # 3-2. which returns `1`
  call of Object#plus_1 on main                 # 5. Calls `plus_1` method with `input`(`1`) as its argument.
return of Object#plus_1 on main => 3            # 7. Returns `3` for step 5
  call of Kernel#puts on main                   # 8. Calls `puts`
  call of IO#puts on #<IO:<STDOUT>>
  call of Integer#to_s on 3                     # 9. Calls `to_s` on `3`, which returns `"3"`
return of Integer#to_s on 3 => "3"
  call of IO#write on #<IO:<STDOUT>>            # 10-1. Passes `"3"` to the `puts` call from step 8
                                                # 10-2. which triggers a side effect that prints the string to Stdout
3 # original output
return of IO#write on #<IO:<STDOUT>> => 2
return of IO#puts on #<IO:<STDOUT>> => nil
return of Kernel#puts on main => nil            # 10-3. And then it returns `nil`.
Enter fullscreen mode Exit fullscreen mode

We can even say it's more detailed than what I said earlier! However, you may notice that steps 2, 4 and 6 are missing from the output. Unfortunately, they are not trackable by TracePoint for the following reasons:

  • 2. Retrieves the input ("1") from ARGV
    • ARGV and the following [] aren't considered as call/c_call at the moment
  • 4. Assigns 1 to local variable input
    • Currently, there's no event for variable assignments. We can (sort of) track it with line event + regex, but it won't be accurate
  • 6. Calls + method on 1 with an argument 2, and returns the result 3
    • Certain method calls like built-in + or attributes accessor methods aren't trackable at the moment

From O(n) to O(log n)

As you can see from the previous example, with proper usage of TracePoint, we can almost make the program "tell us" what it's doing. Now, because of the number of lines we need, TracePoint doesn't grow linearly with the size of our program. I'd say the whole process becomes an O(log(n)) operation.

Next Steps

In this article, I've explained the main difficulty with debugging. Hopefully, I've also convinced you of how TracePoint could be a game-changer. But if you try TracePoint right now, it'll probably frustrate you more than help you.

With the amount of information that comes from TracePoint, you'll soon be swamped by the noise. The new challenge is to filter out the noise, leaving valuable information. For example, in most cases, we only care about specific models or service objects. In these cases, we can filter calls by the receiver's class, like this:

TracePoint.trace(:call) do |tp|
  next unless tp.self.is_a?(Order)
  # tracing logic
end
Enter fullscreen mode Exit fullscreen mode

Another thing to keep in mind is that the block you define for TracePoint could be evaluated tens of thousands of times. At this scale, how you implement the filtering logic can have a great impact on your app's performance. For example, I don't recommend this:

TracePoint.trace(:call) do |tp|
  trace = caller[0]
  next unless trace.match?("app")
  # tracing logic
end
Enter fullscreen mode Exit fullscreen mode

For these 2 problems, I've prepared another article to let you know of some tricks and gotchas I found with some useful boilerplates for typical Ruby/Rails applications.

And if you find this concept interesting, I also created a gem called tapping_device that hides all the implementation hassles.

Conclusion

Debugger and tracing are both great tools for debugging, and we have been using them for many years. But as I've demonstrated in this article, using them requires many manual operations during the debugging process. However, with the help of TracePoint, you can automate many of them and thus boost your debugging performance. I hope you can now add TracePoint to your debugging toolbox and give it a try.

Our guest writer Stan is a Senior Software Engineer @ticketsolve. He loves cats, Ruby on Rails and boxing. He created Goby and works on tapping_device during the weekends.

💖 💪 🙅 🚩
st0012
Stan Lo

Posted on April 9, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related