Stan Lo
Posted on April 9, 2020
Ruby has always been known for the productivity it brings to its developers. Alongside features such as elegant syntax, rich meta-programming support, etc. that make you productive when writing code, it also has another secret weapon called TracePoint
that can help you "debug" faster.
In this post, I'll use a simple example to show you 2 interesting facts I found out about debugging:
- Most of the time, finding the bug itself isn't hard, but understanding how your program works in detail is. Once you have a deep understanding of this, you can usually spot the bug right away.
- Observing your program down to the method call level is time-consuming, and is the major bottleneck of our debugging process.
Then, I'll show you how TracePoint
could change the way we approach debugging by making the program "tell us" what it's doing.
Debugging Is about Understanding Your Program and Its Design
Let's assume we have a Ruby program called plus_1
and it's not functioning correctly. How do we debug this?
# plus_1.rb
def plus_1(n)
n + 2
end
input = ARGV[0].to_i
puts(plus_1(input))
$ ruby plus_1.rb 1
3
Ideally, we should be able to address the bug in 3 steps:
- Learn the expectations from the design
- Understand the current implementation
- Trace the bug
Learning the Expectations from the Design
What's the expected behavior here? plus_1
should add 1
to its argument, which is our input from the command line. But how do we "know" this?
In a real-world case, we can understand the expectations by reading test cases, documents, mockups, asking other people for feedback, etc. Our understanding depends on how the program is "designed".
This step is the most crucial part of our debugging process. If you don't understand how the program should work, you'll never be able to debug it.
However, there are many factors that can be part of this step, such as team coordination, development workflow, etc. TracePoint
won't be able to help you with those, so we won't dwell on these problems today.
Understanding the Current Implementation
Once we've understood the expected behavior of the program, we need to learn how it functions at the moment.
In most cases, we need the following information to fully understand how a program works:
- The methods called during the program's execution
- The call and return order of the method calls
- Arguments passed to each method call
- Values returned from each method call
- Any side effects that happened during each method call, e.g. data mutation or database requests
Let's describe our example with the above information:
# plus_1.rb
def plus_1(n)
n + 2
end
input = ARGV[0].to_i
puts(plus_1(input))
$ ruby plus_1.rb 1
3
- Defines a method called
plus_1
- Retrieves the input (
"1"
) fromARGV
- Calls
to_i
on"1"
, which returns1
- Assigns
1
to local variableinput
- Calls
plus_1
method withinput
(1
) as its argument. The parametern
now carries a value of1
- Calls
+
method on1
with an argument2
, and returns the result3
- Returns
3
for step 5 - Calls
puts
- Calls
to_s
on3
, which returns"3"
- Passes
"3"
to theputs
call from step 8, which triggers a side effect that prints the string to Stdout. Then it returnsnil
.
The description isn't 100% accurate, but it's enough for a simple explanation.
Addressing the Bug
Now that we've learned how our program should work and how it actually works, we can start looking for the bug. With the information we have, we can search for the bug by following the method calls upward (start from step 10) or downward (start from step 1). In this case, we can do it by tracing back to the method that returned 3 in the first place—which is the 1 + 2
in step 6
.
This Is Far from Reality!
Of course, we all know that real debugging isn't as simple as the example makes it out to be. The critical difference between real-life programs and our example is the size. We used 10 steps to explain a 5-line program. How many steps would we need for a small Rails app? It's basically impossible to break down a real program as detailed as we did for the example.
Without a detailed understanding of your program, you won't be able to track down the bug through an obvious path, so you'll need to make assumptions or guesses.
Information Is Expensive
As you probably already noticed, the key factor in debugging is how much information you have. But what does it take to retrieve that much information? Let's see:
# plus_1_with_tracing.rb
def plus_1(n)
puts("n = #{n}")
n + 2
end
raw_input = ARGV[0]
puts("raw_input: #{raw_input}")
input = raw_input.to_i
puts("input: #{input}")
result = plus_1(input)
puts("result of plus_1 #{result}")
puts(result)
$ ruby plus_1_with_tracing.rb 1
raw_input: 1
input: 1
n = 1
result of plus_1: 3
3
As you can see, we only get 2 types of information here: values of some variables and the evaluation order of our puts
(which implies the program's execution order).
How much does this information cost us?
def plus_1(n)
+ puts("n = #{n}")
n + 2
end
-input = ARGV[0].to_i
-puts(plus_1(input))
+raw_input = ARGV[0]
+puts("raw_input: #{raw_input}")
+input = raw_input.to_i
+puts("input: #{input}")
+
+result = plus_1(input)
+puts("result of plus_1: #{result}")
+
+puts(result)
Not only do we need to add 4 puts
into the code, but, in order to print values separately, we also need to split our logic in order to access the intermediate states of some values. In this case, we got 4 additional outputs for the internal states with 8 lines of changes. That's 2 lines of changes for 1 line of output, on average! And since the number of changes grows linearly with the size of the program, we can compare it to an O(n)
operation.
Why Is Debugging Expensive?
Our programs can be written with many goals in mind: maintainability, performance, simplicity, etc. but usually not for "Traceability", meaning, getting the values for inspection, which usually requires a modification of the code, e.g. splitting chained method calls.
- The more the information you get, the more additions/changes you need to make to the code.
However, once the amount of information you get reaches a certain point, you won't be able to process it efficiently. So we either need to filter the information out or label it to help us understand it.
- The more precise the information, the more additions/changes you need to make to the code.
Finally, because the work involves touching the codebase—which can be very different between bugs (e.g. controller vs. model logic)—it's hard to automate it. Even if your codebase is tracing-friendly (e.g. it follows "Law of Demeter" strictly), most of the time, you'll need to type different variable/method names manually.
(Actually, in Ruby, there are some tricks to avoid this—like __method__
. But let's not complicate things here.)
TracePoint: The Savior
However, Ruby provides us an exceptional tool that can largely reduce the cost: TracePoint
. I bet most of you have already heard of it or used it before. But in my experience, not many people use this powerful tool in daily debugging practices.
Let me show you how to use it to collect information quickly. This time, we don't need to touch any of our existing logic, we just need some code before it:
TracePoint.trace(:call, :return, :c_call, :c_return) do |tp|
event = tp.event.to_s.sub(/(.+(call|return))/, '\2').rjust(6, " ")
message = "#{event} of #{tp.defined_class}##{tp.callee_id} on #{tp.self.inspect}"
# if you call `return` on any non-return events, it'll raise error
message += " => #{tp.return_value.inspect}" if tp.event == :return || tp.event == :c_return
puts(message)
end
def plus_1(n)
n + 2
end
input = ARGV[0].to_i
puts(plus_1(input))
If you run the code, you'll see:
return of #<Class:TracePoint>#trace on TracePoint => #<TracePoint:c_return `trace'@plus_1_with_trace_point.rb:1>
call of Module#method_added on Object
return of Module#method_added on Object => nil
call of String#to_i on "1"
return of String#to_i on "1" => 1
call of Object#plus_1 on main
return of Object#plus_1 on main => 3
call of Kernel#puts on main
call of IO#puts on #<IO:<STDOUT>>
call of Integer#to_s on 3
return of Integer#to_s on 3 => "3"
call of IO#write on #<IO:<STDOUT>>
3
return of IO#write on #<IO:<STDOUT>> => 2
return of IO#puts on #<IO:<STDOUT>> => nil
return of Kernel#puts on main => nil
Our code is much more readable now. Isn't it amazing? It prints out most of the program execution with lots of details! We can even map it with my earlier execution breakdown:
- Defines a method called
plus_1
- Retrieves the input (
"1"
) fromARGV
- Calls
to_i
on"1"
, which returns1
- Assigns
1
to local variableinput
- Calls
plus_1
method withinput
(1
) as its argument. The parametern
now carries a value1
- Calls
+
method on1
with an argument2
, and returns the result3
- Returns
3
for step 5 - Calls
puts
- Calls
to_s
on3
, which returns"3"
- Passes
"3"
to theputs
call from step 8, which triggers a side effect that prints the string to Stdout. And then it returnsnil
.
# ignore this, it's TracePoint tracing itself ;D
return of #<Class:TracePoint>#trace on TracePoint => #<TracePoint:c_return `trace'@plus_1_with_trace_point.rb:1>
call of Module#method_added on Object # 1. Defines a method called `plus_1`.
return of Module#method_added on Object => nil
call of String#to_i on "1" # 3-1. Calls `to_i` on `"1"`
return of String#to_i on "1" => 1 # 3-2. which returns `1`
call of Object#plus_1 on main # 5. Calls `plus_1` method with `input`(`1`) as its argument.
return of Object#plus_1 on main => 3 # 7. Returns `3` for step 5
call of Kernel#puts on main # 8. Calls `puts`
call of IO#puts on #<IO:<STDOUT>>
call of Integer#to_s on 3 # 9. Calls `to_s` on `3`, which returns `"3"`
return of Integer#to_s on 3 => "3"
call of IO#write on #<IO:<STDOUT>> # 10-1. Passes `"3"` to the `puts` call from step 8
# 10-2. which triggers a side effect that prints the string to Stdout
3 # original output
return of IO#write on #<IO:<STDOUT>> => 2
return of IO#puts on #<IO:<STDOUT>> => nil
return of Kernel#puts on main => nil # 10-3. And then it returns `nil`.
We can even say it's more detailed than what I said earlier! However, you may notice that steps 2, 4 and 6 are missing from the output. Unfortunately, they are not trackable by TracePoint
for the following reasons:
- 2. Retrieves the input (
"1"
) fromARGV
-
ARGV
and the following[]
aren't considered as call/c_call at the moment
-
- 4. Assigns
1
to local variableinput
- Currently, there's no event for variable assignments. We can (sort of) track it with
line
event + regex, but it won't be accurate
- Currently, there's no event for variable assignments. We can (sort of) track it with
- 6. Calls
+
method on1
with an argument2
, and returns the result3
- Certain method calls like built-in
+
or attributes accessor methods aren't trackable at the moment
- Certain method calls like built-in
From O(n) to O(log n)
As you can see from the previous example, with proper usage of TracePoint
, we can almost make the program "tell us" what it's doing. Now, because of the number of lines we need, TracePoint
doesn't grow linearly with the size of our program. I'd say the whole process becomes an O(log(n))
operation.
Next Steps
In this article, I've explained the main difficulty with debugging. Hopefully, I've also convinced you of how TracePoint
could be a game-changer. But if you try TracePoint
right now, it'll probably frustrate you more than help you.
With the amount of information that comes from TracePoint
, you'll soon be swamped by the noise. The new challenge is to filter out the noise, leaving valuable information. For example, in most cases, we only care about specific models or service objects. In these cases, we can filter calls by the receiver's class, like this:
TracePoint.trace(:call) do |tp|
next unless tp.self.is_a?(Order)
# tracing logic
end
Another thing to keep in mind is that the block you define for TracePoint
could be evaluated tens of thousands of times. At this scale, how you implement the filtering logic can have a great impact on your app's performance. For example, I don't recommend this:
TracePoint.trace(:call) do |tp|
trace = caller[0]
next unless trace.match?("app")
# tracing logic
end
For these 2 problems, I've prepared another article to let you know of some tricks and gotchas I found with some useful boilerplates for typical Ruby/Rails applications.
And if you find this concept interesting, I also created a gem called tapping_device that hides all the implementation hassles.
Conclusion
Debugger and tracing are both great tools for debugging, and we have been using them for many years. But as I've demonstrated in this article, using them requires many manual operations during the debugging process. However, with the help of TracePoint
, you can automate many of them and thus boost your debugging performance. I hope you can now add TracePoint
to your debugging toolbox and give it a try.
Our guest writer Stan is a Senior Software Engineer @ticketsolve. He loves cats, Ruby on Rails and boxing. He created Goby and works on tapping_device during the weekends.
Posted on April 9, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.