@@ -107,20 +107,16 @@ helm --namespace mlrun \
107107 --set opentelemetry-operator.enabled=true \
108108 --set opentelemetry.namespaceLabel.enabled=true \
109109 --set opentelemetry.collector.enabled=true \
110- --set opentelemetry.collector.scrapeMode=otel \
111110 --set opentelemetry.instrumentation.enabled=true \
112111 mlrun/mlrun-ce
113112```
114113
115- > ** Important:** When enabling OpenTelemetry, set ` opentelemetry.collector.scrapeMode=otel ` to collect metrics
116- > via the OTEL sidecar and prevent duplicate metrics. The default is ` direct ` (for when OTEL is disabled).
117-
118114The installation will:
119115- Deploy the OpenTelemetry Operator
120- - Create an OpenTelemetryCollector CR (sidecar mode)
116+ - Create an OpenTelemetryCollector CR (deployment mode — one collector per namespace )
121117- Create an Instrumentation CR for Python auto-instrumentation
122- - Label the namespace with ` opentelemetry.io/inject=enabled `
123- - Configure Prometheus to scrape OTEL sidecar metrics (port 8889)
118+ - Label and annotate the namespace so all Python pods are auto-instrumented automatically
119+ - Configure Prometheus to scrape OTEL collector metrics (port 8889)
124120
125121#### Step 5: Verify OpenTelemetry Installation
126122
@@ -140,21 +136,14 @@ kubectl -n mlrun get instrumentations
140136kubectl -n mlrun get pods | grep opentelemetry
141137```
142138
143- #### Step 6: Verify Jupyter has OTEL Sidecar Annotations
139+ #### Step 6: Verify OTel Pod Labels and Namespace Annotation
144140
145141``` bash
146- kubectl -n mlrun get deployment -l app.kubernetes.io/component=jupyter-notebook \
147- -o jsonpath=' {.items[0].spec.template.metadata.annotations}' | jq .
148- ```
142+ # Check that the namespace has the instrumentation annotation (enables auto-instrumentation for all Python pods)
143+ kubectl get namespace mlrun -o jsonpath=' {.metadata.annotations}' | jq .
149144
150- You should see annotations like:
151- ``` json
152- {
153- "instrumentation.opentelemetry.io/inject-python" : " my-mlrun-otel-instrumentation" ,
154- "prometheus.io/port" : " 8889" ,
155- "prometheus.io/scrape" : " true" ,
156- "sidecar.opentelemetry.io/inject" : " my-mlrun-otel-collector"
157- }
145+ # Check pod labels — all chart-managed pods should have mlrun.io/otel=true
146+ kubectl -n mlrun get pods --show-labels | grep mlrun.io/otel
158147```
159148
160149### Installing MLRun-ce on minikube
@@ -185,7 +174,7 @@ Override those [in the normal methods](https://helm.sh/docs/chart_template_guide
185174### Configuring OpenTelemetry (Observability)
186175
187176MLRun CE includes the OpenTelemetry Operator for collecting metrics and traces from your ML workloads.
188- The operator runs in ** sidecar mode ** , automatically injecting collector containers into annotated pods .
177+ The operator runs one collector ** Deployment ** per namespace. Instrumented pods send OTLP metrics to the collector, which exports them to Prometheus .
189178
190179> ** Note:** OpenTelemetry is ** disabled by default** . See below for how to enable it.
191180
@@ -212,10 +201,10 @@ kubectl label namespace <your-namespace> opentelemetry.io/inject=enabled
212201#### Default Configuration
213202
214203By default, OpenTelemetry is ** disabled** . When enabled, it provides:
215- - Namespace labeling for OTEL operator webhook targeting
216- - Sidecar collector injection for instrumented pods
217- - Python auto-instrumentation for Jupyter notebooks
218- - Prometheus metrics export on port 8889
204+ - A single OTel Collector Deployment per namespace (OTLP receiver → Prometheus exporter on port 8889)
205+ - Namespace-level Python auto-instrumentation (all Python pods in the namespace are instrumented automatically)
206+ - ` mlrun.io/otel: "true" ` label on Jupyter, SeaweedFS, and Nuclio function pods
207+ - Prometheus scrapes the collector pod (not individual pods)
219208
220209#### Enabling OpenTelemetry
221210
@@ -228,7 +217,6 @@ helm --namespace mlrun install my-mlrun \
228217 --set opentelemetry-operator.enabled=true \
229218 --set opentelemetry.namespaceLabel.enabled=true \
230219 --set opentelemetry.collector.enabled=true \
231- --set opentelemetry.collector.scrapeMode=otel \
232220 --set opentelemetry.instrumentation.enabled=true \
233221 mlrun/mlrun-ce
234222```
@@ -240,7 +228,6 @@ helm --namespace mlrun upgrade my-mlrun \
240228 --set opentelemetry-operator.enabled=true \
241229 --set opentelemetry.namespaceLabel.enabled=true \
242230 --set opentelemetry.collector.enabled=true \
243- --set opentelemetry.collector.scrapeMode=otel \
244231 --set opentelemetry.instrumentation.enabled=true \
245232 mlrun/mlrun-ce
246233```
@@ -253,13 +240,12 @@ helm --namespace mlrun upgrade my-mlrun \
253240 --set opentelemetry.collector.enabled=false \
254241 --set opentelemetry.instrumentation.enabled=false \
255242 --set opentelemetry.namespaceLabel.enabled=false \
256- --set opentelemetry.collector.scrapeMode=direct \
257243 mlrun/mlrun-ce
258244```
259245
260246#### Custom Resource Limits
261247
262- Configure collector sidecar resources:
248+ Configure collector resources:
263249
264250``` bash
265251helm --namespace mlrun install my-mlrun \
@@ -282,63 +268,23 @@ helm --namespace mlrun install my-mlrun \
282268
283269#### Adding OpenTelemetry to Custom Workloads
284270
285- To instrument your own deployments with the OTEL sidecar and Python auto-instrumentation:
286-
287- 1 . Ensure your namespace has the OpenTelemetry label:
288- ``` bash
289- kubectl label namespace < your-namespace> opentelemetry.io/inject=enabled
290- ```
291-
292- 2 . Add these annotations to your pod spec:
293- ``` yaml
294- metadata :
295- annotations :
296- sidecar.opentelemetry.io/inject : " <release-name>-otel-collector"
297- instrumentation.opentelemetry.io/inject-python : " <release-name>-otel-instrumentation"
298- prometheus.io/scrape : " true"
299- prometheus.io/scrape-mode : " otel"
300- prometheus.io/port : " 8889"
301- ` ` `
302-
303- #### Preventing Prometheus/OTEL Metric Overlap
304-
305- To prevent duplicate metrics when using both Prometheus direct scraping and OpenTelemetry,
306- MLRun CE uses a **scrape-mode** annotation system:
307-
308- | Scrape Mode | Description | Use Case |
309- |-------------|-------------|----------|
310- | ` direct` | Direct Prometheus scraping only | **Default** - When OTEL is disabled |
311- | `otel` | Metrics collected via OTEL sidecar only | **Recommended when OTEL enabled** |
312- | `both` | Both OTEL and direct scraping | Debugging/transition only |
313-
314- > **Note:** The default scrape mode is `direct`. When enabling OpenTelemetry, you must set
315- > `--set opentelemetry.collector.scrapeMode=otel` to collect metrics via the OTEL sidecar.
316-
317- **How it works:**
318- - OTEL-collected metrics have the `mlrun_otel_` prefix and `metrics_source=otel_collector` label
319- - Direct-scraped metrics have `metrics_source=direct_scrape` label
320- - Prometheus scrape configs filter based on `prometheus.io/scrape-mode` annotation
271+ Python instrumentation is applied ** namespace-wide** — any Python pod in the MLRun namespace is automatically instrumented when OTel is enabled. No per-pod annotations are required.
321272
322- **Configure scrape mode when enabling OTEL:**
273+ For pods in other namespaces, annotate the namespace directly:
323274``` bash
324- helm --namespace mlrun install my-mlrun \
325- --set opentelemetry-operator.enabled=true \
326- --set opentelemetry.collector.enabled=true \
327- --set opentelemetry.collector.scrapeMode=otel \
328- --set opentelemetry.instrumentation.enabled=true \
329- mlrun/mlrun-ce
275+ kubectl annotate namespace < your-namespace> \
276+ instrumentation.opentelemetry.io/inject-python=< release-name> -otel-instrumentation
330277```
331278
332- **Query metrics by source in Prometheus:**
333- ` ` ` promql
334- # OTEL-collected metrics only
335- {metrics_source="otel_collector"}
336-
337- # Direct-scraped metrics only
338- {metrics_source="direct_scrape"}
279+ The ` mlrun.io/otel: "true" ` label is applied to: ** Jupyter** , ** SeaweedFS** (master, volume, filer, s3, admin), and ** Nuclio function pods** (via ` functionDefaults.metadata.labels ` ). This label is used for Prometheus metric filtering and enrichment.
339280
340- # OTEL metrics use prefix
281+ ** Query OTEL-collected metrics in Prometheus:**
282+ ``` promql
283+ # OTEL metrics use the mlrun_otel_ prefix
341284mlrun_otel_http_server_duration_seconds_bucket{...}
285+
286+ # Filter by source
287+ {metrics_source="otel_collector"}
342288```
343289
344290#### Split Installation (Admin/Non-Admin)
0 commit comments