Open Telemetry Sampling Techniques in Python/Java/Go: Optimizing Observability with Selective Data Collection
Jeremiah Adepoju
Posted on March 25, 2024
Open Telemetry is a vendor-agnostic, open-source observability framework that provides a standard way to generate, collect, and export telemetry data (metrics, logs, and traces) from applications. Sampling is a technique used to reduce the volume of telemetry data by selectively capturing a subset of the data, which can be particularly useful in high-traffic environments or when dealing with limited resources.
In Python, Java and Go, Open Telemetry provides several sampling techniques that can be applied to traces. In this article, we will explore these techniques and provide code samples for each.
- ParentBasedSampler
The ParentBasedSampler
is a root sampler that respects the sampling decision made by a parent span. If a parent span is sampled, all its child spans will be sampled as well. If there is no parent span, the sampler makes a sampling decision based on the provided sampling probability. Here's an example:
from opentelemetry import trace
from opentelemetry.sdk.trace import sampling
# Create a ParentBasedSampler with a sampling probability of 0.5
sampler = sampling.ParentBasedSampler(sampling.ALWAYS_ON)
tracer_provider = trace.TracerProvider(sampler=sampler)
tracer = tracer_provider.get_tracer(__name__)
# Start a root span
with tracer.start_as_current_span("root_span"):
# All child spans will be sampled
with tracer.start_as_current_span("child_span"):
pass
import io.opentelemetry.sdk.trace.samplers.ParentBasedSampler;
import io.opentelemetry.sdk.trace.samplers.Sampler;
Sampler sampler = ParentBasedSampler.builder(Sampler.alwaysOn()).build();
import (
"go.opentelemetry.io/otel/sdk/trace"
)
sampler := trace.ParentBased(trace.AlwaysSample())
- TraceIdRatioBasedSampler
The TraceIdRatioBasedSampler
is a root sampler that makes sampling decisions based on the trace ID and a specified sampling ratio. This sampler ensures that the same trace will be consistently sampled or not sampled across the entire system. Here's an example:
from opentelemetry import trace
from opentelemetry.sdk.trace import sampling
# Create a TraceIdRatioBasedSampler with a sampling ratio of 0.25
sampler = sampling.TraceIdRatioBasedSampler(0.25)
tracer_provider = trace.TracerProvider(sampler=sampler)
tracer = tracer_provider.get_tracer(__name__)
# Start a root span
with tracer.start_as_current_span("root_span"):
# The sampling decision is based on the trace ID
with tracer.start_as_current_span("child_span"):
pass
import io.opentelemetry.sdk.trace.samplers.TraceIdRatioBased;
Sampler sampler = TraceIdRatioBased.create(0.25f);
import (
"go.opentelemetry.io/otel/sdk/trace"
)
sampler := trace.TraceIDRatioBased(0.25)
- ProbabilitySampler
The ProbabilitySampler
is a root sampler that makes sampling decisions based on a specified sampling probability. This sampler is suitable for scenarios where you want to sample a fixed percentage of traces. Here's an example:
from opentelemetry import trace
from opentelemetry.sdk.trace import sampling
# Create a ProbabilitySampler with a sampling probability of 0.75
sampler = sampling.ProbabilitySampler(0.75)
tracer_provider = trace.TracerProvider(sampler=sampler)
tracer = tracer_provider.get_tracer(__name__)
# Start a root span
with tracer.start_as_current_span("root_span"):
# The sampling decision is based on the specified probability
with tracer.start_as_current_span("child_span"):
pass
import io.opentelemetry.sdk.trace.samplers.Probability;
Sampler sampler = Probability.create(0.75f);
import (
"go.opentelemetry.io/otel/sdk/trace"
)
sampler := trace.ProbabilitySampler(0.75)
- SamplingDecisionSampler
The SamplingDecisionSampler
is a root sampler that allows you to provide a custom sampling decision function. This function receives the span context and a trace ID, and returns a SamplingResult
object indicating whether the span should be sampled or not. Here's an example:
from opentelemetry import trace
from opentelemetry.sdk.trace import sampling
def custom_sampling_decision(span_context, trace_id):
# Your custom sampling logic here
# Return sampling.SamplingResult.RECORD_AND_SAMPLE to sample the span
# or sampling.SamplingResult.DROP to drop the span
return sampling.SamplingResult.RECORD_AND_SAMPLE
# Create a SamplingDecisionSampler with a custom sampling decision function
sampler = sampling.SamplingDecisionSampler(custom_sampling_decision)
tracer_provider = trace.TracerProvider(sampler=sampler)
tracer = tracer_provider.get_tracer(__name__)
# Start a root span
with tracer.start_as_current_span("root_span"):
# The sampling decision is based on the custom function
with tracer.start_as_current_span("child_span"):
pass
import io.opentelemetry.context.Context;
import io.opentelemetry.sdk.trace.data.SamplingResult;
import io.opentelemetry.sdk.trace.samplers.SamplingDecision;
SamplingResult customSamplingDecision(Context context, String traceId) {
// Your custom sampling logic here
return SamplingResult.recordAndSample();
}
Sampler sampler = SamplingDecision.create(this::customSamplingDecision);
import (
"go.opentelemetry.io/otel/sdk/trace"
)
func customSamplingDecision(spanContext trace.SpanContext, traceID trace.TraceID) trace.SamplingResult {
// Your custom sampling logic here
return trace.SamplingResult{Decision: trace.RecordAndSample}
}
sampler := trace.NewSamplingDecisionSampler(customSamplingDecision)
- CompositeAddIteratorSampler
The CompositeAddIteratorSampler
is a composite sampler that combines multiple root samplers. It applies each sampler in the order they are provided, and the first sampler that decides to sample the span is used. This can be useful when you want to apply different sampling strategies based on specific criteria. Here's an example:
from opentelemetry import trace
from opentelemetry.sdk.trace import sampling
# Create root samplers
sampler1 = sampling.ParentBasedSampler(sampling.ALWAYS_ON)
sampler2 = sampling.TraceIdRatioBasedSampler(0.25)
# Create a CompositeAddIteratorSampler with the root samplers
composite_sampler = sampling.CompositeAddIteratorSampler([sampler1, sampler2])
tracer_provider = trace.TracerProvider(sampler=composite_sampler)
tracer = tracer_provider.get_tracer(__name__)
# Start a root span
with tracer.start_as_current_span("root_span"):
# The sampling decision is based on the composite sampler
with tracer.start_as_current_span("child_span"):
pass
import io.opentelemetry.sdk.trace.samplers.Composite;
import io.opentelemetry.sdk.trace.samplers.ParentBased;
import io.opentelemetry.sdk.trace.samplers.TraceIdRatioBased;
Sampler sampler1 = ParentBased.create(Sampler.alwaysOn());
Sampler sampler2 = TraceIdRatioBased.create(0.25f);
Sampler compositeSampler = Composite.create(List.of(sampler1, sampler2));
import (
"go.opentelemetry.io/otel/sdk/trace"
)
sampler1 := trace.ParentBased(trace.AlwaysSample())
sampler2 := trace.TraceIDRatioBased(0.25)
compositeSampler := trace.NewCompositeSampler([]trace.Sampler{sampler1, sampler2})
These are the main sampling techniques provided by Open Telemetry in Python, Java and Go. By using these techniques, you can optimize the amount of telemetry data collected and reduce the overhead on your system while still maintaining adequate observability.
References
Posted on March 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.