OpenTelemetry Tail Sampling Replacement
Our team was interested in sending our traces and spans from our Elixir services to an opentelemetry-collector service, which would then sample and send them to Datadog. A typical web request, for example, should result in a single trace. A trace is a record of all the notable calls across services involved in that web request, represented by spans. A span is a call or event nested within a trace or another span of that trace.
We wanted a few policies:
- All errors and high-latency errors are sent
- Sample a tiny fraction of traces related to our most called endpoints
- Sample about 2/3 of everything else
We would also want Datadog to be able to show the true numbers of traces, regardless the ingestion rate we applied locally.
Tail Sampling (did not work)
The appropriate pattern to identify features of a trace such as the HTTP status
or the time it took to execute the trace is called Tail Sampling.
For a while, we attempted to use OpenTelemetry’s
tailsamplingprocessor.
For some reason, we were able to send traces but not link them to operations
suitable for Datadog views, even when using the always_sample
configuration
like this:
config:
processors:
batch:
send_batch_max_size: 500
send_batch_size: 50
timeout: 10s
datadog:
tail_sampling:
policies:
- name: always
type: always_sample
service:
pipelines:
traces:
receivers:
- otlp
processors:
- batch
- datadog
- tail_sampling
exporters:
- datadog
This configuration instructs the opentelemetry-collector
to receive OTLP
messages. We batch
incoming traces in order to handle batches instead of one at a time. Then
the datadog processor
analyzes these traces and sends metrics about them to Datadog; this
is how the graphs that show hits per second are populated. Next, we apply
tail sampling to group spans by trace and push forward all traces. Finally, we
export the traces to Datadog with the
datadogexporter.
If we removed tail_sampling
from the processors
section, everything would
come through as expected. This led to several days of frustration, trying
various tail_sampling
policies without success while reading numerous blog
posts that this processor works for other people, even recently.
Debugging OpenTelemetry Collector
It is possible to see debug-level logging by adding a telemetry
map to the service
section of the configuration
that sets the log level:
service:
telemetry:
logs:
level: debug
pipelines:
traces:
receivers:
- otlp
processors:
- batch
- datadog
- tail_sampling
exporters:
- datadog
This showed us that the tail_sampling
processor reports it was working. With
always_sample
, it marked every trace as being Sampled
. It told us how many
it received and how many it forwarded onward. The datadog
exporter exported
the same number traces. They simply were not connected to the operation in
Datadog.
Deprecating Tail Sampling
Then we ran into this Github
issue,
which indicates that the community wants to deprecate the
tailsamplingprocessor
.
- It is inefficient, consuming too much CPU (confirmed in our system)
- It could be broken up into multiple composable processors
- Under pressure, it has a memory leak
Unfortunately, the only work towards this deprecation was to separate out
span aggregation for a
trace;
users would need to rely on the groupbytrace
processor for that aspect.
How to Tail Sample without Tail Sampling
It turns out that OpenTelemetry Collector has a composable solution for what
we wanted to do in the form of using multiple pipelines. The tradeoff is that
each pipeline must be careful to exclude traces. This approach consumes a
little more memory but is vastly more efficient than using the
tailsamplingprocessor
. Here is a version of our solution:
config:
processors:
datadog:
batch:
send_batch_max_size: 500
send_batch_size: 50
timeout: 10s
# Exclude everything captured in other pipelines
filter/default:
error_mode: ignore
traces:
span: >
(
( attributes["http.status_code"] >= 400 ) or
( end_time_unix_nano - start_time_unix_nano > 5000000000 ) or
( attributes["http.route"] == "/super/common/endpoint" )
)
# We always pass through interesting things, like errors, so exclude
# anything that isn't interesting.
filter/all:
error_mode: ignore
traces:
span: >
not (
( attributes["http.status_code"] >= 400 ) or
( end_time_unix_nano - start_time_unix_nano > 5000000000 )
)
# Spammy spans
filter/spammy:
error_mode: ignore
traces:
span: >
not (
( attributes["http.route"] == "/super/common/endpoint" )
)
probabilistic_sampler/default:
hash_seed: 23
sampling_percentage: 66.6
probabilistic_sampler/spammy:
hash_seed: 23
sampling_percentage: 5.0
attributes/default:
actions:
- action: upsert
key: otel.pipeline
value: default
attributes/all:
actions:
- action: upsert
key: otel.pipeline
value: all
attributes/spammy:
actions:
- action: upsert
key: otel.pipeline
value: spammy
service:
pipelines:
traces/all:
receivers:
- otlp
processors:
- batch
- filter/all
- datadog
- attributes/all
exporters:
- datadog
traces/spammy:
receivers:
- otlp
processors:
- batch
- filter/spammy
- datadog
- probabilistic_sampler/spammy
- attributes/spammy
exporters:
- datadog
traces/default:
receivers:
- otlp
processors:
- batch
- filter/default
- datadog
- probabilistic_sampler/default
- attributes/default
exporters:
- datadog
In this configuration, we have three pipelines:
trace/all
- Sample all errors and high-latency tracestrace/spammy
- Sample a tiny fraction of spammy tracestrace/default
- Sample 2/3 of everything else
The pipelines process in this fashion:
- Batch the incoming traces
- Filter the traces to what we want to handle in this pipeline
- Aggregate traces into metrics
- Use probabilistic sampling to sample the traces according to our policy
- Add a debugging attribute to each traces so we can validate the policies
Because we needed a bit of math for calculating latency and comparing HTTP
status codes, we had to use the
OTTL
capabilities of the
filter
processor. This puts filters in an exclusion-only mode, so the logic is in the
inverse of how a person would think of each policy. We solved this by using
blocks wrapped with not
.
Success
The
opentelemetry-collector
service allows us to sideload the concern of sampling traces in our cluster.
The tailsamplingprocessor
did not work for us, but the more composable
multi-pipeline approach did.