Tracing with OpenTelemetry

Tracing in Fission

Up to 1.14.1 release, Fission supports collecting traces to an OpenTracing Jaeger-formatted trace collection endpoint. Tracing provides insight into what Fission is doing, and how it is doing it. OpenTelemetry provides a new tracing system that is more flexible and powerful. As we add support OpenTelemetry, OpenTracing will be marked deprecated and removed. OpenTelemtry is backward compatible with OpenTracing.

If you are starting fresh with Fission, we recommend using OpenTelemetry. This is primarily because OpenTelemetry makes robust, portable telemetry a built-in feature of cloud-native software. OpenTelemetry provides a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application.

OpenTelemetry

OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs. The project provides a vendor-agnostic implementation that can be configured to send telemetry data to the backend(s) of your choice. It supports a variety of popular open-source projects including Jaeger and Prometheus.

Fission Opentelemetry Integration

If you are have OpenTelemetry installed, you can use it to collect traces and metrics from Fission.

openTelemetry section in helm chart will be used to configure OpenTelemetry sdk used by different Fission components. Fission chart will pass different environment variables to pod to configure OpenTelemetry based on the configuration.

OptionDescription
openTelemetry.otlpCollectorEndpointCollector endpoint for OpenTelemetry
openTelemetry.otlpInsecureSecure endpoint for the collector with true/false
openTelemetry.otlpHeadersKey-value pairs to be used as headers associated with gRPC or HTTP requests
openTelemetry.tracesSamplerSampler for traces
openTelemetry.tracesSamplingRateArgument for sampler
openTelemetry.propagatorsPropagator to generate trace id header

If you have not configured collector endpoint, you won’t be able to visualize traces. Based on sampler configuration, you can observed trace_id in Fission component logs. You can search with trace_id across Fission services logs in case of debugging or troubleshooting.

Many of the observability platforms such as DataDog, Dynatrace, Honeycomb, Lightstep, New Relic, Signoz, Splunk etc. support OpenTelemetry out-of-box, otlpHeaders can be used to configure the headers required by the observability platform. You don’t need to setup up Opentelemtry Collector from scratch in that case.

If you feel any of the above options are not adequate, feel free to raise an issue or open a pull request.

Types of samplers

  • always_on - Sampler that always samples spans, regardless of the parent span’s sampling decision.
  • always_off - Sampler that never samples spans, regardless of the parent span’s sampling decision.
  • traceidratio - Sampler that samples probabalistically based on rate.
  • parentbased_always_on - (default if empty) Sampler that respects its parent span’s sampling decision, but otherwise always samples.
  • parentbased_always_off - Sampler that respects its parent span’s sampling decision, but otherwise never samples.
  • parentbased_traceidratio - (default in chart) Sampler that respects its parent span’s sampling decision, but otherwise samples probabalistically based on rate.

Sampler Arguments

Each Sampler type defines its own expected input, if any. Currently we get trace ratio for the case of following samplers,

  • traceidratio
  • parentbased_traceidratio

Sampling probability, a number in the [0..1] range, e.g. “0.1”. Default is 0.1.

Types of propagators

Based on the propagator type, OpenTelemetry will generate a trace id header.

  • tracecontext - W3C Trace Context
  • baggage - W3C Baggage
  • b3 - B3 Single
  • b3multi - B3 Multi
  • jaeger - Jaeger uber-trace-id header
  • xray - AWS X-Ray (third party)
  • ottrace - OpenTracing Trace (third party)

Propogator config is uselful, if you want to use different header from W3C Trace Context. E.g. If you are using OpenTracing/Jaeger, you can set propagator to jaeger.

Sample OTEL Collector

We will be using the OpenTelemetry Operator for Kubernetes to setup OTEL collector. To install the operator in an existing cluster, cert-manager is required.

Use the following commands to install cert-manager and the operator:

# cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml

# open telemetry operator
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Once the opentelemetry-operator deployment is ready, we need to create an OpenTelemetry Collector instance.

The following configuration provides a good starting point, however, you may change as per your requirement:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-conf
  namespace: opentelemetry-operator-system
  labels:
    app: opentelemetry
    component: otel-collector-conf
data:
  otel-collector-config: |
    receivers:
      # Make sure to add the otlp receiver.
      # This will open up the receiver on port 4317
      otlp:
        protocols:
          grpc:
            endpoint: "0.0.0.0:4317"
    processors:
    extensions:
      health_check: {}
    exporters:
      jaeger:
        endpoint: "jaeger-collector.observability.svc.cluster.local:14250"
        insecure: true
      prometheus:
        endpoint: 0.0.0.0:8889
        namespace: "testapp"
      logging:

    service:
      extensions: [health_check]
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [jaeger]

        metrics:
          receivers: [otlp]
          processors: []
          exporters: [prometheus, logging]
---
apiVersion: v1
kind: Service
metadata:
  name: otel-collector
  namespace: opentelemetry-operator-system
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  ports:
    - name: otlp # Default endpoint for otlp receiver.
      port: 4317
      protocol: TCP
      targetPort: 4317
      nodePort: 30080
    - name: metrics # Default endpoint for metrics.
      port: 8889
      protocol: TCP
      targetPort: 8889
  selector:
    component: otel-collector
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
  namespace: opentelemetry-operator-system
  labels:
    app: opentelemetry
    component: otel-collector
spec:
  selector:
    matchLabels:
      app: opentelemetry
      component: otel-collector
  minReadySeconds: 5
  progressDeadlineSeconds: 120
  replicas: 1 #TODO - adjust this to your own requirements
  template:
    metadata:
      annotations:
        prometheus.io/path: "/metrics"
        prometheus.io/port: "8889"
        prometheus.io/scrape: "true"
      labels:
        app: opentelemetry
        component: otel-collector
    spec:
      containers:
        - command:
            - "/otelcol"
            - "--config=/conf/otel-collector-config.yaml"
            # Memory Ballast size should be max 1/3 to 1/2 of memory.
            - "--mem-ballast-size-mib=683"
          env:
            - name: GOGC
              value: "80"
          image: otel/opentelemetry-collector:0.6.0
          name: otel-collector
          resources:
            limits:
              cpu: 1
              memory: 2Gi
            requests:
              cpu: 200m
              memory: 400Mi
          ports:
            - containerPort: 4317 # Default endpoint for otlp receiver.
            - containerPort: 8889 # Default endpoint for querying metrics.
          volumeMounts:
            - name: otel-collector-config-vol
              mountPath: /conf
          # - name: otel-collector-secrets
          #   mountPath: /secrets
          livenessProbe:
            httpGet:
              path: /
              port: 13133 # Health Check extension default port.
          readinessProbe:
            httpGet:
              path: /
              port: 13133 # Health Check extension default port.
      volumes:
        - configMap:
            name: otel-collector-conf
            items:
              - key: otel-collector-config
                path: otel-collector-config.yaml
          name: otel-collector-config-vol
EOF

Note: The above configuration is borrowed from the OpenTelemetry Collector traces example, with some minor changes.

Jaeger

We will using the Jaeger Operator for Kubernetes to deploy Jaeger. To install the operator, run:

kubectl create namespace observability
kubectl create -n observability -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.39.0/jaeger-operator.yaml

Note that you’ll need to download and customize the Role Bindings if you are using a namespace other than observability.

Once the jaeger-operator deployment in the namespace observability is ready, create a Jaeger instance, like:

kubectl apply -n observability -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
EOF

Check if the otel-collector and jaeger-query service has been created:

kubectl get svc --all-namespaces

NAMESPACE                       NAME                                                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                  AGE
cert-manager                    cert-manager                                                ClusterIP   10.96.228.4     <none>        9402/TCP                                 7m50s
cert-manager                    cert-manager-webhook                                        ClusterIP   10.96.214.220   <none>        443/TCP                                  7m50s
default                         kubernetes                                                  ClusterIP   10.96.0.1       <none>        443/TCP                                  9m35s
kube-system                     kube-dns                                                    ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP                   9m33s
observability                   jaeger-agent                                                ClusterIP   None            <none>        5775/UDP,5778/TCP,6831/UDP,6832/UDP      3s
observability                   jaeger-collector                                            ClusterIP   10.96.48.27     <none>        9411/TCP,14250/TCP,14267/TCP,14268/TCP   3s
observability                   jaeger-collector-headless                                   ClusterIP   None            <none>        9411/TCP,14250/TCP,14267/TCP,14268/TCP   3s
observability                   jaeger-operator-metrics                                     ClusterIP   10.96.164.206   <none>        8383/TCP,8686/TCP                        61s
observability                   jaeger-query                                                ClusterIP   10.96.186.29    <none>        16686/TCP,16685/TCP                      3s
opentelemetry-operator-system   opentelemetry-operator-controller-manager-metrics-service   ClusterIP   10.96.29.83     <none>        8443/TCP                                 6m11s
opentelemetry-operator-system   opentelemetry-operator-webhook-service                      ClusterIP   10.96.74.0      <none>        443/TCP                                  6m11s
opentelemetry-operator-system   otel-collector                                              NodePort    10.96.107.99    <none>        4317:30080/TCP,8889:30898/TCP            2m22s

Now, setup a port forward to the jaeger-query service:

kubectl port-forward service/jaeger-query -n observability 8080:16686 &

You should now be able to access Jaeger at http://localhost:8080/.

Installing Fission

At the time of writing this document, the Fission installation does not have OpenTelemetry enabled by default. In order to enable OpenTelemetry collector, we need to explicitly set the value of openTelemetry.otlpCollectorEndpoint:

export FISSION_NAMESPACE=fission
helm install --namespace $FISSION_NAMESPACE \
  fission fission-charts/fission-all \
  --set openTelemetry.otlpCollectorEndpoint="otel-collector.opentelemetry-operator-system.svc:4317" \
  --set openTelemetry.otlpInsecure=true \
  --set openTelemetry.tracesSampler="parentbased_traceidratio" \
  --set openTelemetry.tracesSamplingRate="1"

Note: You may have to change the openTelemetry.otlpCollectorEndpoint value as per your setup.

Testing

In order to verify that our setup is working and we are able to receive traces, we will deploy and test a fission function. For this test we will be using a simple NodeJS based function.

# create an environment
fission env create --name nodejs --image fission/node-env

# get hello world function
curl https://raw.githubusercontent.com/fission/examples/main/nodejs/hello.js > hello.js

# register the function with Fission
fission function create --name hello --env nodejs --code hello.js

# run the function
fission function test --name hello
hello, world!

Traces with Jaeger

If you have been following along, you should be able to access Jaeger at http://localhost:8080/. Refresh the page and you should see multiple services listed in the Service dropdown. Select the Fission-Router and click the Find Traces button. You should see the spans created for the function request we just tested.

Select the trace and on the next page expand the spans.

You should be able to see the request flow similar to the one below:

Fission OpenTelemetry

If you enable OpenTelemetry tracing within your function, you can capture spans and events for the function request.

Following are few samples of spans and events captured by invoking a Go base function:

Fission Spans

Fission Executor Span Events