Build A Simple Tracing System in Elixir

msramos

Marcos Ramos

Posted on February 6, 2024

Build A Simple Tracing System in Elixir

In this post, we'll cover how Elixir applications can be traced using OpenTelemetry and how macros can make this process super easy and streamlined.

First, we'll talk about tracing and OpenTelemetry in Elixir. Then we'll improve our custom tracing layer step-by-step until we get an easy and seamless tool to trace our application.

Let's get started!

On Tracing in Elixir

Note: All the research done to write this article resulted in the creation of the abstracing library. It's far from complete, but it encapsulates all the ideas written here.

Think of instances when your app crashes in production. A bunch of artifacts are generated: stacktraces, logs, reports, etc.

When using proper tracing, developers can link all these artifacts in a sequence of events — from a starting point down to the response, and operations with side effects.

Here is a simple example of a POST /users trace request:

POST users trace

From left to right, we can see that the request was received, decoded, validated, saved, and then, finally, a response was encoded and sent. Each little block in this trace is called a span. Spans are the building blocks of tracing, as they represent events inside an application.

Spans require this basic information: the start, the end, and the status (success or error). We can enrich each span with more data, and they can even have a direct correlation to these entities if we add the appropriate metadata to logs and errors.

Meet OpenTelemetry for Elixir

The OpenTelemetry project homepage states that it is a:

Collection of APIs, SDKs, and tools

That can be used to:

Instrument,
generate, collect, and export telemetry data (metrics, logs, and traces)
.

For Elixir, the OpenTelemetry library has everything we need to perform distributed tracing.

Here is a very simple example of how we can use it:

# example.ex
defmodule Example do
  require OpenTelemetry.Tracer

  def some_fun() do
    OpenTelemetry.Tracer.with_span "example.some_fun" do
      # ...
      OpenTelemetry.Tracer.set_attribute("key", "value")
      # ...
  end
end
Enter fullscreen mode Exit fullscreen mode

What's Happening Here?

As you can see, it's pretty straightforward. To start using all macros inside OpenTelemetry.Tracer, we first require it at the top of our module.

When we want to start a span, we just need to call OpenTelemetry.Tracer.with_span/2 and write our code.

Under the hood, :otel_tracer.with_span/4 is used to actually start the span — even though the OpenTelemetry API does provide Elixir modules to interact with, all the heavy lifting is actually written in Erlang.

One really cool thing about spans is that we can add metadata to reported data. This gives us more contextual information to investigate issues. We can do this by calling OpenTelemetry.Tracer.set_attribute/2. It only accepts a small set of types (atoms, booleans, binaries, and tuples), so we need to be mindful when using it.

Now for a brief overview of how I ended up building an abstraction layer for OpenTelemetry.

Why I Built an Abstraction Layer for OpenTelemetry

When I first started using OpenTelemetry, I noticed that I was constantly creating small private functions to translate data and help me with the setup. As I did more and more of that, I eventually extracted all this boilerplate to its own feature. I created an abstraction layer for OpenTelemetry.

A few pain points during my use of OpenTelemetry that were solved by this abstraction layer:

  • If your code throws an unexpected exception, the span will not be collected.
  • Adding complex data requires transforming it first.
  • Long namespaces (not really a problem, but I just don't like them 😁).

Luckily, the OpenTelemetry library also has a low-level API that can be used to customize our tracing tooling, and that's exactly what we'll be doing now!

Breaking Down the Pain Points of OpenTelemetry

Now that we know some of the direct pain points of using OpenTelemetry, let's break them down into separate categories and solve each one. We can group the features that we inject boilerplate code in to:

  • Setup: how we prepare a module to be traced
  • Start/stop spans: the steps required to actually create spans
  • Modifying spans: adding more attributes to spans
  • Exception handling: collecting errors and changing the span status

In the end, we want to cover all these features with the least amount of boilerplate code as possible.

A Setup for a Setup in Elixir

The very first thing we need to do is prepare our module to use the tracing macros and libraries. We'll use Elixir's special macro __using__/1 to automate some of this stuff for us:

# lib/tracing.ex
# ...
defmacro __using__(_opts) do
  quote do
    require OpenTelemetry.Tracer

    require unquote(__MODULE__)
    import unquote(__MODULE__), only: [span: 1, span: 2, span: 3]
  end
end
# ...
Enter fullscreen mode Exit fullscreen mode

Now, whenever we need to trace our code, we just need to call use Tracing at the top of our module.

defmodule MyModule do
  use Tracing
  # ...
end
Enter fullscreen mode Exit fullscreen mode

So far, so good — but nothing too exciting. However, by using this simple setup macro we don't have to modify the modules using it if we ever make changes to the setup process, as all changes will be automatically replicated. That's a good start!

In the next section, we'll start to remove some more meaningful boilerplate.

Translating Elixir Application Data to Span Attributes

OpenTelemetry allows applications to include attributes in spans. An attribute consists of a key and a value. This helps us to include useful information that can later be used to either create monitoring triggers or investigate a crash.

But here is a catch: OpenTelemetry only accepts numbers, strings, atoms, booleans, and lists (if its elements are from any of the supported basic types). Applications work with a richer set of data types: not only numbers and strings but also complex lists, maps, structs, and tuples.

Of course, we can use inspect/1 on the variable values and have everything in there. However, this makes searching for spans a much harder task, as we would need to use complex regexes to search for them.

Convert Complex Types to Basic Types

It's possible, however, to convert the complex types to more basic (and supported) types. Let's define a few rules:

  • Lists will use their indexes to name the values
  • Tuples will be converted to lists
  • Maps will use their keys to name values
  • Structs will be converted to maps

So, a simple list like [1, 2, 3] would be transformed into a list of pairs:

my_list = [1, 2, 3]
Tracing.set_attribute("numbers", my_list)
# numbers.0 = 1
# numbers.1 = 2
# numbers.2 = 3
Enter fullscreen mode Exit fullscreen mode

For maps, we can use their keys to generate the pairs:

my_map = %{first_name: "John", last_name: "Wick"}
Tracing.set_attribute("user", my_map)
# user.first_name = "John"
# user.last_name = "Wick"
Enter fullscreen mode Exit fullscreen mode

Using Elixir's defguard

Since we're handling maps, lists, and tuples as a set of values, we can use Elixir's defguard
to create a custom function guard:

# lib/tracing.ex
defmodule Tracing do
  # ...
  defguard is_set(value) when is_map(value) or is_list(value) or is_tuple(value)
  # ...
end
Enter fullscreen mode Exit fullscreen mode

Now we can start building our custom set_attribute:

# lib/tracing.ex
defmodule Tracing do
  # ...
  def set_attribute(key, value) when is_set(value) do
    set_attributes(key, value) # To be yet defined!
  end

  def set_attribute(key, value) do
    OpenTelemetry.Tracer.set_attribute(key, value)
  end
  # ...
end
Enter fullscreen mode Exit fullscreen mode

It's simple: if the value is a set, we call set_attributes (in plural!), otherwise, we just delegate to OpenTelemety.Tracer.set_attribute/2.

The next piece is where all the transformation happens:

# lib/tracing.ex
defmodule Tracing do
  # ...
  def set_attributes(key, values) do
    key
    |> enumerable_to_attrs(values) # To be yet defined!
    |> OpenTelemetry.Tracer.set_attributes()
  end
  # ...
end
Enter fullscreen mode Exit fullscreen mode

Here, we receive the set of values, convert them, and then call OpenTelemetry.Tracer.set_attributes/1 to do the actual work of adding the attributes to the span.

The data conversion happens in the enumerable_to_attrs/2 function. It works by recursively going into each element of the collection and converting it to the appropriate basic type supported by OpenTelemetry. Adding its code here is beyond the scope of this post, but feel free to check it out on GitHub!

Wrapping Up

In this post, we discussed the basics of tracing and began to explore how we can utilize OpenTelemetry in Elixir. We laid the foundation for an abstraction layer that will simplify the creation and manipulation of spans, making the process seamless and straightforward.

Happy coding!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

💖 💪 🙅 🚩
msramos
Marcos Ramos

Posted on February 6, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related