OpenTelemetry NodeJS: All you need to know

austinlparker

austin

Posted on December 3, 2020

OpenTelemetry NodeJS: All you need to know

Hi all, tedsuo here. We’re passing an important milestone on OpenTelemetry: the tracing specification is about to be frozen, and release candidates for OpenTelemetry tracing implementations will be coming soon, with metrics following in the next couple of months.

While we are putting our core documentation together, I thought now would be a good time to point out how simple it is to actually use distributed tracing in JavaScript. OpenTelemetry is a large framework, it has a lot of options and a lot of surface area. But, as an end user, you don’t have to worry about all of that. So, forget the details: this walkthrough contains all you need to know to actually use OpenTelemetry in Node. Apply this walkthrough to your application, and you are good to go.

TL;DR

All you need to know is:

  • Initialization: How to start and shutdown cleanly.
  • Tracer methods: getTracer, getCurrentSpan, startSpan, and withSpan.
  • Span methods: setAttribute, addEvent, recordException, setStatus, and end.

Seriously, that’s it. If you want to try it out, follow the guide below. A heavily commented version of the finished tutorial can be found at https://github.com/tedsuo/otel-node-basics.

Hello, world

For this tutorial, we’re going to make a very, very simple application: an express service that responds to http://localhost:9000/hello with “Hello World.” It’s as basic as it is original!

First, make a directory to contain your project, and install express:

npm i express
Enter fullscreen mode Exit fullscreen mode

Once we have that, let’s get to coding. Make a file called server.js and serve up some hello world:

const express = require('express');

const app = express();

app.get('/hello', (req, res) => {
 res.status(200).send('Hello World');
});

app.listen(9000);
Enter fullscreen mode Exit fullscreen mode

Then, make a file called client.js which sends 5 requests to the server and then exits.

const http = require('http');

function makeRequest() {
   http.get({
     host: 'localhost',
     port: 9000,
     path: '/hello',
   }, (response) => {
     const body = [];
     response.on('data', (chunk) => body.push(chunk));
     response.on('end', () => {
       console.log(body.toString());
     });
   });
}

for (let i = 0; i < 5; i++) {
 makeRequest();
}
Enter fullscreen mode Exit fullscreen mode

Boot up server and check that it works:

> node server.js
Enter fullscreen mode Exit fullscreen mode

With the server running, test the client in another tab:

> node client.js
Hello World
Hello World
Hello World
Enter fullscreen mode Exit fullscreen mode

OpenTelemetry Architecture in 30 seconds

Ok, I said no details, but here is one that is actually helpful. OpenTelemetry clients have two major components: the SDK and the API. The SDK is the actual framework, the API is what you use to instrument your code.

This separation provides loose coupling: your application code only depends on the API, which has virtually no dependencies and acts like a no-op when the SDK is not installed. This allows packages to add instrumentation without automatically pulling in the implementation’s dependency chain (think grpc, etc). This separation of concerns is especially helpful for OSS libraries that want to bake in instrumentation, but don’t want to create overhead or dependency conflicts when OpenTelemetry is not being used.

Tip: Never reference any SDK package outside of installation and setup. All other packages and application code should only depend on the API.

In case you were wondering, while there are two Javascript SDKs - one for NodeJS and one for the browser - there is only one Javascript API. Instrumented code remains portable between both environments.

Pick an OpenTelemetry backend

Ok, let’s add OpenTelemetry to this application. To test our tracing, you’ll need a place to send the data.

At Lightstep, we created free-for-life community accounts specifically for making OpenTelemetry easy to experiment with. If you don’t already have one, please grab an account.

If you’d like to use Zipkin or Jaeger instead, this getting started guide will walk you through the setup. Once you’re set-up, you can come back here and follow the rest of the tutorial.

Install the NodeJS OpenTelemetry Launcher

Since we’re connecting to Lightstep, we’ll also be using the Lightstep Distro of OpenTelemetry, the OpenTelemetry Launchers. Distros package up any plugins and configuration needed to talk to a particular backend. At the moment, we’re still fleshing out the full definition of a Distro (what is allowed, and what isn’t), but the basic point is to make getting started easier by reducing configuration boilerplate. If you want more detail, you can check out this blog post where I initially proposed the concept.

Installing the OpenTelemetry Launcher package will also install OpenTelemetry, plus all currently available instrumentation.

npm i lightstep-opentelemetry-launcher-node
Enter fullscreen mode Exit fullscreen mode

Create an OpenTelemetry initialization file

To instrument your server, you need to start the OpenTelemetry SDK before loading your application. As of v0.12, OpenTelemetry NodeJS loads asynchronously, This is actually the trickiest bit of OpenTelemetry right now, and in future versions will move to a simpler, synchronous startup. However, for now you can copy and paste the approach below, and it will work for any application.

Create a file called server_init.js. This will serve as your new entry point. You can copy and paste the below code.

const {
 lightstep,
 opentelemetry,
} = require('lightstep-opentelemetry-launcher-node');

const sdk = lightstep.configureOpenTelemetry({
 accessToken: '<ACCESS_TOKEN>',
 serviceName: 'hello-server-1',
 serviceVersion: 'v1.2.3',
 propagators: 'tracecontext,b3',
});

sdk.start().then(() => {
 require('./server');
});

function shutdown() {
 sdk.shutdown().then(
   () => console.log("SDK shut down successfully"),
   (err) => console.log("Error shutting down SDK", err),
 ).finally(() => process.exit(0))
};

process.on('exit', shutdown);
process.on('SIGINT', shutdown);
process.on('SIGTERM', shutdown);
Enter fullscreen mode Exit fullscreen mode

Configure the launcher with your Lightstep Access Token (You can find your Access Token on the settings page). Create a client_init.js file in the same manner, only change the serviceName to ‘hello-client’ and the required startup file to ‘./client’.

Use the launcher to start the SDK. Once the SDK has completed its setup, require your original entry point to start your application.

Why load your application in two phases like this? If your application begins requiring packages (or running) before OpenTelemetry is set up, it can create issues. By initializing OpenTelemetry in a separate file, and only requiring the rest of your application after the SDK is started, OpenTelemetry has an opportunity to automatically apply any available instrumentation, as well as auto-detect any available system resources before your application starts to run. It also ensures that your application loads normally.

Run your application with OpenTelemetry

Start your newly auto-instrumented server and client. Let’s also turn the debug logs on, so we can see what OpenTelemetry is doing.

export OTEL_LOG_LEVEL=debug
node server_init.js
Enter fullscreen mode Exit fullscreen mode
export OTEL_LOG_LEVEL=debug
node client_init.js
Enter fullscreen mode Exit fullscreen mode

At startup, the debug logs will print out the configuration, and list every successfully loaded instrumentation library. Every time the tracer flushes data, all of the spans which have been exported are printed out. This can be really helpful for debugging when you are setting up.

Check out what automatic instrumentation gives you

Switch over to Lightstep, or your backend of choice, and confirm the spans were received:

OTel Node (1)


Yup, we see spans. Click through and look at a trace:

Example trace in Lightstep


Notice that we see a client span from hello-client, a server span from hello-server, and several internal spans representing built-in express components. Also, notice that the client and server spans are already populated with HTTP, network, and other attributes.

All of this common information is standardized across instrumentation as semantic conventions. An HTTP request will always be described with the same keys and values, regardless of what language or package it comes from.

This is a lot of really useful information. We already have a complete trace, with a lot of detail, and we haven’t written any instrumentation yet. When rolling out OpenTelemetry, this is the approach I recommend. Get OpenTelemetry installed into every service and ensure that context is propagating correctly, before adding any further detail. This will be enough information to set up error monitoring and identify latency issues.

The OpenTelemetry Javascript API

Ok, so the out-of-the-box experience will get you a long way, but of course, you will eventually want to add additional application data.
Spans should ideally be managed by your application framework. In this case, the express framework manages the span for you. In your application code, you can continue to decorate these spans with more information. There are two primary types of data you will want to add: attributes and events.

Span attributes are indexes for segmenting your data. For example, you may want to add project.id or account.id in order to understand if slow requests and errors are specific to a certain set of accounts, or affecting everyone.

Fine grain logging can be added as span events. Events are a form of structured logging - use them like you would logs. The advantage with span events is that you can automatically find all of the logs associated with a particular transaction, rather than having to go hunting with a bunch of searches and filters. As you scale up, this becomes a lifesaver (or, at least, a big time saver).

First, require the OpenTelemetry API. At the package level, create a tracer and name it after your package:

const opentelemetry = require('@opentelemetry/api');
const express = require('express');

// create a tracer and name it after your package
const tracer = opentelemetry.trace.getTracer('@otel-node-basics/server');

const app = express();
Enter fullscreen mode Exit fullscreen mode

The name of the tracer appears on every span as the instrumentation.name attribute. This is useful for investigating instrumentation issues.

Once you have a tracer, you can use it to access the server span created by the express instrumentation. Calling tracer.getCurrentSpan() will return the span for the current context. Once you have access to the span, you can add attributes and events.

const app = express();

app.get('/hello', (req, res) => {
 // access the span created by express instrumentation
 span = tracer.getCurrentSpan();
  // add an attribute to segment your data by projectID
 span.setAttribute('projectID', '123');
 // log an event and include some structured data.
 span.addEvent('setting timeout', { sleep: 300 });

 setTimeout(()=> {
   span.addEvent(responding after timeout);
   res.status(200).send('Hello World');
 }, 300);
});

app.listen(9000);
Enter fullscreen mode Exit fullscreen mode

You can also chain these methods, which can be a little more concise.

app.get('/hello', (req, res) => {
 tracer.getCurrentSpan()
       .setAttribute('projectID', '123')
       .addEvent('setting timeout', { sleep: 300 });

 setTimeout(()=> {
   tracer.getCurrentSpan().addEvent('sending response');
   res.status(200).send('Hello World');
 }, 300);
});
Enter fullscreen mode Exit fullscreen mode

Run your server and client again, and you will see these new attributes and events show up on the same spans.

Creating your own spans

You can also create your own spans. These spans will automatically become children of the current span and added to the trace.

Span management involves three steps: starting the span, setting it as the current span, and ending the span.

To start a child span, grab the tracer again, and call tracer.startSpan( name ). Name the span after the operation you are measuring. Advice on naming can be found in the tracing specification.

IMPORTANT: make sure to end the span when your operation finishes, or you will have a leak!

After span.end() is called, Spans are queued up to be exported in the next flush. Calls to setAttribute and addEvent become no-ops after span.end() is called.

app.get('/hello', (req, res) => {
  // start a new span named “sleeper”
 const childSpan = tracer.startSpan("sleeper");

 setTimeout(()=> {
   // childSpan works normally when referenced
   childSpan.addEvent('finished sleeping');
   // However, starting a span does not automatically
   // set it to the current span. getCurrentSpan still 
   // returns the parent span.
   tracer.getCurrentSpan();
   res.status(200).send('Hello World');
   // Ending the span is a requirement. It measures the duration 
   // of the operation, and then sends the span to the exporter.
   childSpan.end();
 }, 300);
});
Enter fullscreen mode Exit fullscreen mode

So, the above “works,” except the child span has not been set as the current span. In almost all circumstances, this is critical. You want the rest of your code to be able to access the span without handing it around as a parameter. And unless you set the new span as current, getCurrentSpan will return the parent span, which would be incorrect.

So, after you start a span, create a closure in which the span is active by calling tracer.withSpan(span, cb). Within the callback, the new span will now be active.

app.get('/hello', (req, res) => {
  // start a new span named “sleeper”
 const childSpan = tracer.startSpan("sleeper");

 // use withSpan to create a new context
 tracer.withSpan(childSpan,()=> {
   setTimeout(()=> {
     // getCurrentSpan now correctly returns childSpan
     const span = tracer.getCurrentSpan();
     span.addEvent('sending response');
     res.status(200).send('Hello World');
     span.end();
   }, 300);
 });
});
Enter fullscreen mode Exit fullscreen mode

My advice is to avoid creating child spans, except when you truly require a new context - seprating out a dabatase operations from application code, for example. Ideally, span management should happen in some kind of framework, rather than scattered about your application code. Favor adding events over creating child spans. If you pool all of your attributes on to the same span, you will get better indexing.

Error Handling

There is one final type of event that deserves special attention: exceptions. In OpenTelemetry, exceptions are recorded as events. But, to ensure that the exception is properly formatted, the span.recordException(error) method should be used instead of addEvent.

app.get('/hello', (req, res) => {
 try {
   throw ("ooops");
 } catch (error) {
   // Add the exception as a properly formatted event.
   span.recordException(error);

   // Set the status code to make the exception count 
   // as an error.
   span.setStatus({ code: 
     opentelemetry.CanonicalCode.UNKNOWN });
 }
Enter fullscreen mode Exit fullscreen mode

By default, exceptions do not count as errors. In OpenTelemetry, an error means that the overall operation did not complete. Plenty of exceptions are expected, and a handled exception does not automatically mean the entire operation failed to complete. In other cases, an operation could fail without an exception being thrown.

In order to declare an operation a failure, call span.setStatus() and pass in an error code. Status codes are used by analysis tools to automatically trigger alerting, measure error rates, etc.

Note: status codes will be simplified in the next version of OpenTelemetry.

That’s all, folks!

And that is that. All you need to know to get started with tracing in NodeJS. Hopefully, that was pretty straight forwards, and clears up any mysteries about how to use OpenTelemetry.

If you stick with the above patterns, you can get a great deal of visibility with very little work. Of course, there are many more details and options; you can check out the API documentation for more information. I also have a more involved getting started guide; it works as a handy reference for all of the procedures described above.

OpenTelemetry is still in beta due to API changes, but it is also already in production across many organizations. If you stick to a Distro and automated instrumentation, you can use OpenTelemetry today without much fear of a breaking change, as those changes will most likely involve the API.

If you are writing manual instrumentation during the beta, consider creating helper functions that simplify the API for your use cases, and give you a centralized place to manage any potential breakage.

Also: consider joining our community! There are plenty of libraries left to instrument. You can find us on GitHub, or say hi on gitter.

💖 💪 🙅 🚩
austinlparker
austin

Posted on December 3, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related