Distributed Tracing and OpenTelemetry Guide
Daniele Iasella
Posted on September 29, 2023
Microservices have become popular for modern web applications since they provide many benefits over traditional monolithic architectures. However, microservices are not a silver bullet; they also have a fair share of challenges. For example, debugging and troubleshooting errors in microservices can be challenging since tracking the request flow across multiple services is difficult.
That's where distributed tracing and OpenTelemetry come in. OpenTelemetry is an Observability framework designed to create and manage telemetry data like traces, metrics, and logs from distributed systems. So, in this article, I will take you through the steps of using OpenTelemetry within a Node.js ecosystem to trace your microservices applications effectively.
What is distributed tracing
The complexity of microservices makes it difficult to track the request path through multiple microservices. Distributed tracing is an observability technique used to track these requests across microservices. In other words, we can define it as a flashlight that helps you understand the request flow across your system.
Distributed tracing is beneficial for developers in many scenarios. For example, there can be a single microservice with a slow response time, slowing down the whole application. Tracing data lets you pinpoint the exact origin and easily troubleshoot the issue.
Benefits of using distributed tracing:
- Identify performance bottlenecks.
- Provides a comprehensive view of the system.
- Provides insights into the dependencies between different services.
- Identify potential security vulnerabilities.
- Support both synchronous (gRPC, REST, GraphQL) and asynchronous (event sourcing, pub-sub) application architectures.
Components of distributed tracing?
A typical distributed tracing system is built up with the below components:
- Trace: End-to-end path of a single user request as it moves through various services.
- Span: A single operation or unit of work within a distributed system. It captures information like start time, end time, metadata, or annotation that might be useful to understand what is happening.
- Context Propagation: Passing contextual information between different services within a distributed system. It is essential for connecting spans to construct a complete trace of a request.
Since you now have a brief idea of what distributed tracing is, let's see how to implement distributed tracing with Node.js.
Instrumenting Node.js app with OpenTelemetry
In this example, I will create 3 Node.js services (shipping, notification, and courier) using Amplication, add traces to all services, and show how to analyze trace data using Jaeger.
Step 1: Generating services using Amplication
As the first step, you must create the Node.js services with Amplication. In this example, I will be using three already created Prisma schemas. You can find those schemas in this GitHub repository.
Once you are ready with schemas, go to the Amplication dashboard and create a new Project.
Then, select the project from the dashboard and connect the GitHub repository with Prisma schemas to that project.
Now, you can start creating services. For that, return to the Amplication dashboard and click the Add Resources button.
Then, enter the necessary information to make the service. In this case, I have named the services as "courier gateway service" with the below settings:
- Git Repository: I've used the GitHub repo, which I connected earlier.
- REST or GraphQL: I've enabled both options to show the file structure generated by Amplication.
- Repo type: Monorepo.
- Database: PostgreSQL
- Authentication: Included
It will take a few seconds to generate the service.
After that, you need to modify a few database settings to avoid collision between databases when sharing the same Docker service. For that, navigate to the Plugins tab, select the PostgreSQL DB plugin, and click the Settings button.
There, you will see a JSON file like below, and you need to update the dbName property. Here, I have renamed it to courier.
Then, go back to the Entities tab and import the courier Prisma schema to generate the entities related to the courier service.
Once the schema is imported, you will see 2 new entities named Parcel and Quote in the Entities tab.
Now, perform the same steps again for the other two services.
Step 2: Adding a Kafka integration
In this example, I will use a Message Broker to communicate between these services. You can easily generate a Message Broker through Amplication by clicking the Add Resource button and selecting the Message Broker option.
Then, go back to the shipping service and install the Kafka plugin to allow the shipping service to use the Message Broker.
Then, go to the Connections tab and select the Message pattern as Send to allow the shipping service to send messages.
Similarly, go to the notification service and select Message pattern as Receive to subscribe to the Message Broker.
Step 3: Building the application
Click the Commit change & build button to finalize the changes. It will start the build process, generate the new files in the Git repo, and create a pull request.
Note: Make sure to merge the pull request to the main branch to get the latest updates.
Step 4: Configuring Docker compose
Each service generated by Amplication contains a separate Docker compose file. But, in this example, I want to share the same database with all services. Hence, I created a new Docker compose file by coping the content of the docker-compose files generated by amplication.
version: "3"
name: otel-workshop
services:
# Shared DB for all services
db:
image: postgres:12
ports:
- 5432:5432
environment:
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
volumes:
- postgres:/var/lib/postgresql/data
# Jaeger
jaeger-all-in-one:
image: jaegertracing/all-in-one:latest
ports:
- "16686:16686"
- "14268"
- "14250"
# Collector
collector-gateway:
image: otel/opentelemetry-collector:latest
volumes:
- ./collector-gateway.yml:/etc/collector-gateway.yaml
command: ["--config=/etc/collector-gateway.yaml"]
ports:
- "1888:1888" # pprof extension
- "13133:13133" # health_check extension
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "55670:55679" # zpages extension
depends_on:
- jaeger-all-in-one
kafka-ui:
container_name: kafka-ui
image: provectuslabs/kafka-ui:latest
ports:
- "8080:8080"
depends_on:
- zookeeper
- kafka
environment:
KAFKA_CLUSTERS_0_NAME: local
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
KAFKA_CLUSTERS_0_JMXPORT: 9997
zookeeper:
image: confluentinc/cp-zookeeper:7.3.1
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- "2181:2181"
kafka:
image: confluentinc/cp-kafka:7.3.1
depends_on:
- zookeeper
ports:
- "9092:9092"
- "9997:9997"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_MESSAGE_MAX_BYTES: 10485760
JMX_PORT: 9997
KAFKA_JMX_OPTS: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka -Dcom.sun.management.jmxremote.rmi.port=9997
healthcheck:
test: nc -z localhost 9092 || exit -1
start_period: 15s
interval: 30s
timeout: 10s
retries: 10
volumes:
postgres: ~
I don't need to specify a collector getaway in the above configuration since I'm using jaeger-all-in-one. However, I have specified a collector getaway to highlight the receiver's components and ports.
Run docker-compose up --detach
command to start Docker.
Step 5: Configuring services for local development
Now, you need to set up all 3 services for local development. For that, you just need to follow the instructions given in the README.md file.
Note:: You don't need to run `_npm run docker:dev` command since Docker is already running_
Once all the databases are initialized and dependencies are installed, start each service using npm run start:watch
command and the courier-gateway-service-admin using npm run start
command.
Note: You need to update the ports of each service from the _.env_
files to avoid clashes between the services.
Step 6: Creating a Parcel through admin view
You can easily create a new Parcel by logging into the admin view.
.
Step 7: Connecting the services
First, navigate to the shipping service and install axios using npm install axios
command. Then, add the below code to the shipping-service/src/shipment/shipment.service.ts file to get parcel details.
import { Injectable } from "@nestjs/common";
import { PrismaService } from "../prisma/prisma.service";
import { ShipmentServiceBase } from "./base/shipment.service.base";
import { Prisma, Shipment } from "@prisma/client";
import axios from "axios";
import { KafkaProducerService } from "../kafka/kafka.producer.service";
import { ShippingEvent } from "./shipping.event";
import { MyMessageBrokerTopics } from "../kafka/topics";
@Injectable()
export class ShipmentService extends ShipmentServiceBase {
constructor(
protected readonly prisma: PrismaService,
private readonly kafkaProducerService: KafkaProducerService
) {
super(prisma);
}
async create<T extends Prisma.ShipmentCreateArgs>(
args: Prisma.SelectSubset<T, Prisma.ShipmentCreateArgs>
): Promise<Shipment> {
const {
data: { accessToken },
} = await axios.post(http://localhost:3002/api/login , {
username: "admin",
password: "admin",
});
const { data: parcels } = await axios.get(
http://localhost:3002/api/parcels ,
{
params: {},
headers: {
Authorization: Bearer ${accessToken} ,
},
}
);
const randomParcel = Math.floor(Math.random() * parcels.length);
const shipment = await super.create<T>({
...args,
data: {
...args.data,
price: parcels[randomParcel].price,
},
});
const event: ShippingEvent = {
Message: Shipment id: ${shipment.id} ,
CustomerId: "1b2c",
};
await this.kafkaProducerService.emitMessage(
MyMessageBrokerTopics.ShipmentCreateV1,
{
key: shipment.id,
value: event,
}
);
return shipment;
}
}
Step 8: Creating a client app
Before starting instrumenting, let's create a client application to get shipment data. This can be a simple Node.js project with a main.js file containing the code to fetch shipment data.
// main.js
"use strict";
const axios = require("axios");
const url = "http://localhost:3004/api/shipments";
const numberOfRequests = 5;
const makeRequest = async (requestId) => {
const result = await axios.post(url);
return result;
};
const main = async () => {
for (let i = 0; i < numberOfRequests; i++) {
const res = await makeRequest(i);
console.log("Response", res.data);
}
};
main();
Step 9: Adding tracing
Create a new file named tracing.js in the same directory where you created the main.js file to fetch shipment data. Then, install all the OpenTelemetry dependencies using the below command:
ls
npm install @opentelemetry/sdk-node \
@opentelemetry/api \
@opentelemetry/resources\
@opentelemetry/semantic-conventions \
@opentelemetry/instrumentation-http
Add the below code to the tracing.js file.
const {
BasicTracerProvider,
SimpleSpanProcessor,
} = require("@opentelemetry/sdk-trace-base");
const { Resource } = require("@opentelemetry/resources");
const {
SemanticResourceAttributes,
} = require("@opentelemetry/semantic-conventions");
const { trace } = require("@opentelemetry/api");
const {
OTLPTraceExporter,
} = require("@opentelemetry/exporter-trace-otlp-http");
const { NodeSDK } = require("@opentelemetry/sdk-node");
const { HttpInstrumentation } = require("@opentelemetry/instrumentation-http");
const { B3Propagator } = require("@opentelemetry/propagator-b3");
const exporter = new OTLPTraceExporter({});
const getTracer = () => {
return trace.getTracer("default");
};
const sdk = new NodeSDK({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: "fake-client-app",
[SemanticResourceAttributes.SERVICE_VERSION]: "0.1.0",
}),
spanProcessor: new SimpleSpanProcessor(exporter),
traceExporter: exporter,
instrumentations: [new HttpInstrumentation()],
textMapPropagator: new B3Propagator(),
});
sdk.start();
module.exports = { getTracer };
To start tracing, you need to import the above file to the main.js file and do some modifications. The updated main.js file will look like below:
js
"use strict";
const { getTracer } = require("./tracing");
const axios = require("axios");
const { trace } = require("@opentelemetry/api");
const tracer = getTracer("fake-client");
const url = "http://localhost:3004/api/shipments";
const numberOfRequests = 1;
const makeRequest = async (requestId) => {
return tracer.startActiveSpan("makeRequests", async (span) => {
span.updateName(makeRequests-${requestId} );
const result = await axios.post(url);
span.end();
return result;
});
};
tracer.startActiveSpan("main", async (span) => {
for (let i = 0; i < numberOfRequests; i++) {
const res = await makeRequest(i);
console.log("Response", res.data);
}
span.end();
});
Now, you can run the client application with the node main.js
command and monitor the trace data with Jeager.
.
That's it. You successfully created a Node. js-based microservices application using Amplication, configuring tracing, and monitoring trace data through Jeager. You can find the complete code example in GitHub and watch the video below to understand the code used for tracing.
Step 10: Adding tracing to the generated services
As Amplication now supports OpenTelemetry through a plugin, we will leverage the plugin to integrate all the services without much effort.
Go to each service starting from the shipping service and install the OpenTelemetry plugin.
Click the Commit change & build button to finalize the changes. It will start the build process again, generate the new files and update existing ones in the Git repo, and create/update a pull request.
Now try to perform new requests as before and observe the tracing data in Jaeger!
Watch Webinar
I took a live workshop few weeks ago on Distributed Tracing and Open Telemetry. You can watch it here: https://www.youtube.com/watch?v=Pu-HiD2QksI
Best practices to follow
- Prioritize critical paths and high-impact services.
- Use consistent and meaningful naming conventions for spans, and services.
- Ensure that trace context is propagated across service boundaries. This typically involves adding trace headers to HTTP requests or message headers.
- Use tags and annotations to add additional metadata to spans.
- Implement adaptive sampling strategies that adjust the sampling rate based on the service's load, and error rates.
- Automatically capture and log errors.
- Retain trace data for an appropriate period.
Conclusion
This guide provided an overview of implementing tracing for Node.js-based microservices applications. As you can see, enabling tracing for your application requires little effort. But it can save you a whole lot of troubleshooting and debugging time. Thank you for reading.
Posted on September 29, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.