
Geschreven door Funs Janssen
Software Consultant
I’m Funs Janssen. I build software and write about the decisions around it—architecture, development practices, AI tooling, and the business impact behind technical choices. This blog is a collection of practical notes from real projects: what scales, what breaks, and what’s usually glossed over in blog-friendly examples.
Modern product teams ship across web apps, mobile apps, and APIs. Users do not care which layer broke. They just feel “the app is slow” or “checkout failed”.
That’s why one observability standard across every surface is no longer a nice-to-have. It’s a reliability requirement.
If your browser telemetry speaks one language, your iOS/Android telemetry speaks another, and your backend logs are “best effort”, you end up with the same outcome every time: long incident calls, lots of guessing, and unclear ownership.
This article shows a practical route to implement OpenTelemetry Observability Web App and Mobile App end-to-end. We’ll cover traces, metrics, logs, correlation IDs, and how to make them consistent across platforms.
Then we’ll turn that telemetry into outcomes: dashboards, alerts, incident runbooks, and SLAs that actually mean something. You’ll walk away with a rollout plan you can ship in phases, without drowning your team in noisy data or surprise bills.
Introduction
If you run a web app, a mobile app, and a backend API, you already have a distributed system. Even if it’s “just one product”.
And distributed systems fail in distributed ways: a DNS hiccup, a bad mobile release, a slow database query, a flaky third-party API, or a queue backlog that only shows up under peak load.
The problem is not that teams lack data. It’s that they lack consistent data.
OpenTelemetry gives you a vendor-neutral way to ship traces, metrics, and logs with a shared context, using a common protocol (OTLP) and a common set of semantic conventions. That consistency is what enables faster debugging, fewer “can’t reproduce” incidents, and cleaner conversations with stakeholders about uptime, latency, and support.
In other words, OpenTelemetry Observability Web App and Mobile App is not about adding more dashboards. It’s about building a shared reliability story across frontend, mobile, and backend so you can define realistic SLAs, spot regressions faster, and reduce MTTR when something breaks.
Quick Takeaways
Here’s the “do this first” checklist for OpenTelemetry Observability Web App and Mobile App that works for most teams:
Target architecture
- Web app + mobile apps + backend services emit telemetry.
- Everything exports via OTLP to an OpenTelemetry Collector.
- The Collector forwards to your observability backend (Grafana, Datadog, New Relic, Honeycomb, Elastic, Azure Monitor, and so on).
This “apps > Collector > backend” setup makes governance possible without blocking delivery. It also gives you one place to enforce PII scrubbing and cost controls.
Minimum signals to ship first (highest ROI)
- Traces (backend first, then frontends)
- Logs with trace correlation (so errors jump to traces)
- Add metrics once you know what you want to alert on
Dashboards/alerts that pay off fastest
- API: request rate, error rate, latency (p95/p99), dependency latency
- Mobile: crash-free sessions, network error rate, cold start time
- Web: navigation/load performance, frontend error rate, backend correlation rate
If you do nothing else, do this: make every incident answerable with one trace ID path and one consistent set of service/environment attributes.
Build a consistent observability model (the “contract”)
Before you instrument everything, define an observability contract.
Think of it like an API schema, but for telemetry: naming, attributes, context propagation, and correlation rules. Without that, your data will technically exist, but it won’t be usable.
This is the core of OpenTelemetry Observability Web App and Mobile App: every layer tells the same story, using the same identifiers.
Required conventions: service/resource attributes & environments
You want dashboards that work across services and environments, without being rebuilt every time you add a new component.
Start by standardizing a short list of resource attributes, based on OpenTelemetry semantic conventions:
service.name: stable logical service identifier (required) (opentelemetry.io)service.version: release version (recommended) (opentelemetry.io)deployment.environment(or the newer naming your org standardizes on):preview,staging,prod(opentelemetry.io)- Optional but useful:
service.namespace(team or product grouping) (opentelemetry.io) -cloud.region,k8s.cluster.name, etc (if you run cloud/Kubernetes)
For mobile, add a few device/app attributes consistently:
app.platform:ios/androidapp.version: store version (not build number only)device.model,os.version(careful: these can increase cardinality)network.type: wifi/cellular (if available)
Keep the list short. High-cardinality attributes (like full device IDs, emails, or free-form strings) will either explode costs or get blocked by your backend.
A simple naming rule that prevents weeks of pain later:
- Service names are stable (do not include environment in
service.name) - Environments are attributes (
deployment.environment) - Versions are attributes (
service.version)
That way, “API latency in prod vs staging” is a filter, not a different dashboard.
Correlation IDs + W3C Trace Context end-to-end
Distributed tracing correlation lives or dies on propagation.
OpenTelemetry uses the W3C Trace Context standard: traceparent and tracestate. (w3.org) This is what lets web, mobile, gateways, and services join the same trace.
Here’s the practical strategy most teams should use:
- Use W3C trace context for system-to-system correlation (automatic where possible).
- Keep a separate, user-facing correlation ID for support and UX workflows.
Why keep a correlation ID if you already have traceparent
- You might not sample every trace.
- You might not want to expose trace IDs to customers.
- You want one ID that appears in:
- app UI error screens
- support tickets
- backend logs
- incident chat threads
Your propagation checklist:
- Browser app injects trace context into API requests (where supported).
- Mobile networking layer injects trace context into API requests.
- API gateway preserves (or restarts) trace context intentionally at trust boundaries.
- Backend services propagate context across:
- HTTP calls
- message queues
- background jobs
Logs and metrics should link back to traces via:
trace_id/span_idfields in logs- exemplar support in metrics if your backend supports it (optional)
If you want a concrete ASP.NET approach for correlation beyond tracing headers, see my guide on Correlation IDs in ASP.NET for end-to-end tracing.
Instrumentation blueprint
Instrumentation is where most teams either:
- over-instrument and drown in noise, or
- under-instrument and still can’t debug incidents
A reliable plan for OpenTelemetry Observability Web App and Mobile App is: start small, expand safely.
Ship the minimum “spine” first:
- backend request traces
- correlated logs
- frontend/mobile network traces into the backend
Then expand to:
- DB spans
- messaging spans
- mobile performance signals
- browser navigation timing and errors
- custom business spans (only when you know what questions you need to answer)
Web app (browser/RUM) + backend traces
Browser instrumentation is powerful, but it has real constraints.
OpenTelemetry’s browser client instrumentation is still described as experimental and mostly unspecified, so treat it as an evolving surface. (opentelemetry.io) That doesn’t mean “don’t use it”. It means “be deliberate”.
What to capture first in the web app:
- document load / navigation spans (initial page load)
- fetch/XHR spans for API calls
- frontend errors (uncaught exceptions, failed resource loads)
Your #1 goal:
- join frontend work to backend traces for the same user action
That requires:
- injecting
traceparentheaders on API calls (when allowed) - ensuring the backend accepts and continues the trace context
Backend traces: what to instrument first
- HTTP server instrumentation (ASP.NET Core, Node, etc.)
- HTTP client instrumentation (outbound calls)
- database instrumentation (SQL client, ORM)
- messaging instrumentation (queues, event streams)
If you’re on .NET, OpenTelemetry zero-code (automatic) instrumentation is a great first step to get baseline traces and metrics without touching application code. (opentelemetry.io) Use it to prove value quickly, then add manual spans where it matters (checkout, login, search, sync jobs).
Noise control tips for web + backend:
- Avoid capturing full URLs with query strings.
- Do not create spans for every tiny UI event.
- Use sampling early, even if it’s a simple head-based percentage.
Mobile app essentials
Mobile observability has different “golden signals” than backend services. Users feel:
- slow startup
- janky scrolling
- network errors
- crashes
For opentelemetry mobile observability iOS android, focus on signals that explain those outcomes:
- Cold start time (and warm start if relevant)
- Network request latency and errors
- Crashes and ANRs (Android “Application Not Responding”)
- Slow frames / rendering issues
- Session-level context (session ID, app version, device class)
OpenTelemetry Android provides a client-apps SDK with a configuration DSL, including HTTP export to an OTLP endpoint and session settings. (opentelemetry.io) That makes it a solid base for consistent OTLP collector architecture for web and mobile.
Mobile-specific implementation details that matter in real life:
- Session IDs: generate a random session identifier and attach it as an attribute. Do not use advertising IDs or emails.
- Offline buffering: mobile networks drop. Your SDK should buffer and retry within sane limits.
- On-device redaction: scrub obvious sensitive fields before export whenever possible.
- Export via OTLP: ideally to your Collector endpoint (not directly to the vendor).
A simple, non-PII correlation pattern:
session.id(random UUID)user.state(anonymous/authenticated, not the user ID)install.channel(app store, enterprise, testflight)release.version(same semantic version you show the user)
Then ensure your backend logs include:
- correlation ID (support-friendly)
- trace ID (debug-friendly)
- environment + version (release-friendly)
That’s how RUM + distributed tracing end-to-end correlation becomes useful instead of theoretical.
Collector governance: sampling, PII scrubbing, and cost controls
The OpenTelemetry Collector is not just plumbing.
It’s your governance layer. It’s where you enforce:
- what data is allowed to leave your boundary
- what gets dropped for cost control
- how attributes are standardized so dashboards work
OpenTelemetry explicitly positions the Collector as a place to transform and govern telemetry for data quality, cost, and security. (opentelemetry.io)
Sampling strategy (head vs tail) that matches your risk & budget
Sampling is not a “later” problem. If you wait, you’ll either:
- get surprised by bills, or
- panic-sample during an incident and lose the data you needed
Two common approaches:
Head-based sampling
- Decision is made at the start of the trace (SDK side).
- Easy to implement, predictable cost.
- Best for: early-stage teams, high traffic, “good enough” debugging.
Tail-based sampling
- Decision is made after seeing the whole trace (Collector side).
- Lets you keep traces that are:
- errors
- high latency
- specific endpoints or customers (careful with PII)
- Best for: scale-ups, incident-heavy systems, high-value transactions.
OpenTelemetry’s own guidance for tail sampling uses the tail sampling processor in the Collector and shows policies like “keep errors” plus a probabilistic remainder. (opentelemetry.io) A practical rollout:
- Start with head-based sampling (5% to 20% depending on traffic).
- Add tail-based rules for:
status_code == ERROR- latency over a threshold (p95 tail)
- critical routes like
/checkoutor/login
- Keep 100% of traces in preview/staging if your volume allows, because that’s where you want maximum debug signal.
Key interaction with incident investigations:
- If you only sample randomly, you might miss the one failing trace.
- Tail sampling reduces that risk by preserving “interesting” traces.
- But tail sampling costs Collector memory and CPU, so size it intentionally.
PII and sensitive data handling (scrub before it leaves your boundary)
If you take one thing seriously in OpenTelemetry Observability Web App and Mobile App, make it this:
Do not ship secrets or PII in telemetry.
The W3C Trace Context spec is explicit that vendors must not include personally identifiable information in tracestate. (w3.org) That’s your baseline mindset for all telemetry attributes too.
What not to capture:
- auth tokens (Bearer tokens, session cookies, refresh tokens)
- emails, phone numbers, addresses
- full URLs with query params (often contain PII)
- request/response bodies (almost always contain sensitive data)
- raw user IDs if they are directly identifiable
Preferred patterns:
- Allowlists over denylists for high-risk attributes.
- Hashing for stable correlation without revealing the value.
- Masking for partial visibility (last 4 digits), if truly required.
Collector-side enforcement:
- Use Collector processors to delete, redact, or transform attributes. OpenTelemetry documents transforming telemetry with processors like filter, attributes, and transform. (opentelemetry.io)
- Many vendors also provide guidance on using Collector processors to handle sensitive information (delete, redact, hash). (docs.honeycomb.io)
A good governance workflow:
- Define a “telemetry data policy” (one pager).
- Add automated checks in code review for logging/telemetry changes.
- Enforce redaction rules in the Collector so mistakes don’t leak.
That’s how you scale observability across teams without relying on everyone being perfect.
Operationalize: dashboards, alerts, runbooks, and SLAs
Telemetry is only valuable if it changes operations.
The goal is not “more graphs”. The goal is:
- faster detection
- faster diagnosis
- faster mitigation
- and clear reporting against SLAs
This is where dashboards and alerts for SLO SLI OpenTelemetry matters.
Dashboards & alert set that supports SLOs
Dashboards should be built around what users experience, not what’s easy to chart.
Start with the classic “golden signals” mindset:
- latency
- traffic
- errors
- saturation
Then adapt per surface.
API (backend) dashboards
- Request rate by route
- Error rate (4xx vs 5xx)
- Latency p50/p95/p99 by route
- Dependency latency (DB, cache, third-party APIs)
- Queue backlog / consumer lag (if async exists)
Web app dashboards
- Navigation timing (initial load, route changes)
- Frontend error rate (JS exceptions)
- API error rate as seen by the browser
- Correlation coverage: % of frontend requests that successfully join backend traces
Mobile dashboards
- Crash-free sessions (by app version)
- Cold start time (p50/p95)
- Network error rate and latency (by endpoint, by network type)
- Slow frames / rendering problems (by device class, by app version)
Alerting principles:
- Page on user-visible symptoms, not “something changed”.
- Avoid alerts that don’t have a clear action.
- Prefer SLO-based alerts where possible.
Google’s SRE material strongly emphasizes alerting tied to SLOs and incident response practices, not random thresholds. (sre.google) Even if you don’t implement full error budgets on day one, align alerts to:
- “Checkout failure rate exceeded X for Y minutes”
- “p95 latency exceeded X for Y minutes”
- “Crash-free sessions dropped below X% after release”
Also, do not forget uptime and dependency basics. Pair tracing with health endpoints and synthetic checks. If you’re in .NET, my guide on .NET API health checks for dependency and uptime signals is a practical complement to your OpenTelemetry dashboards.
Incident runbooks + CI/CD integration (Azure DevOps & per-PR envs)
Observability gets real when your alerts link to actions.
A simple incident runbook template that works well:
- Symptom: what the user sees (and the SLO impacted)
- Scope: which environment, which version, which region
- First queries:
- dashboard link
- trace search (error traces, slow traces)
- log query with correlation ID
- Mitigation options:
- rollback
- feature flag off
- scale up/down
- disable a dependency path
- Validation:
- confirm metrics recovered
- confirm error budget burn stopped
- Post-incident:
- root cause summary
- action items (owner + date)
Two CI/CD practices make OpenTelemetry Observability Web App and Mobile App dramatically easier:
1) Release annotations
- Add deployment markers (version, commit hash, environment) so graphs show “what changed”.
- Standardize
service.versionon every deployment so you can filter and compare. (opentelemetry.io) If you run Azure DevOps, build this into your pipelines so it’s automatic and consistent. My article on Azure DevOps CI/CD foundations for consistent release + ops workflows lays out a lean structure small teams can maintain.
2) Per pull request environments
- Preview environments let you validate:
- trace propagation
- sampling rules
- PII scrubbing
- dashboards and alerts (at least smoke tests)
- before the change hits staging or prod
That is especially useful for instrumentation work, because observability changes are easy to ship “working” but still wrong. Here’s my guide on per pull request environments to validate telemetry + releases safely.
Bring it all together with SLAs and maintenance:
- Your SLA is only defendable if you can measure the SLI.
- Your maintenance contract is easier to justify when you can report:
- uptime
- latency
- incident counts
- MTTR
- top regressions by release
Observability is how you turn “we think it’s stable” into a measurable, auditable reliability story.
Conclusion
A consistent rollout of OpenTelemetry Observability Web App and Mobile App is not about boiling the ocean.
It’s a phased system you can ship safely:
- Contract first: standardize naming (
service.name,service.version), environments, and trace context propagation. - Instrumentation next: start with backend traces and correlated logs, then connect web and mobile network spans into the same trace story.
- Governance via Collector: add sampling, attribute normalization, and PII scrubbing before telemetry becomes unmanageable.
- Operationalize: build dashboards and alerts tied to SLOs, then write runbooks that link symptoms to traces, logs, and mitigations.
When you do this well, the wins show up fast:
- fewer regressions escaping into production
- faster incident triage because every team is looking at the same “source of truth”
- clearer SLA reporting because your measurements are consistent across platforms
- better product decisions because you can see performance and reliability per release
If you want help building an observability baseline that fits your stack, FJAN IT can implement the contract, set up the Collector, instrument your apps, and deliver dashboards, alerts, and runbooks that your team actually uses. That’s the difference between “we have telemetry” and “we run reliable software”.
FAQs
Feedback
What stack are you running right now (.NET/Node, Azure/AWS, React/Next.js, native iOS/Android, Flutter, React Native) And what’s your biggest reliability pain point today: slow incidents, noisy alerts, missing mobile visibility, or unclear SLAs
If you want, share a few details and I’ll suggest an OpenTelemetry Observability Web App and Mobile App baseline rollout. FJAN IT also offers an observability baseline review where we map your current telemetry to a concrete instrumentation + Collector + dashboard plan.
References
- W3C. Trace Context (traceparent, tracestate) Specification. https://www.w3.org/TR/trace-context/
- OpenTelemetry. Browser Getting Started (JS) and limitations. https://opentelemetry.io/docs/languages/js/getting-started/browser/
- OpenTelemetry. .NET zero-code (automatic) instrumentation. https://opentelemetry.io/docs/zero-code/dotnet/
- OpenTelemetry. Android client apps instrumentation and OTLP export configuration. https://opentelemetry.io/docs/platforms/client-apps/android/
- OpenTelemetry. Transforming telemetry in the Collector (processors, filtering, attributes, transform). https://opentelemetry.io/docs/collector/transforming-telemetry/
- OpenTelemetry Blog. Tail sampling with OpenTelemetry Collector (policies and guidance). https://opentelemetry.io/blog/2022/tail-sampling/
- OpenTelemetry. Service and resource semantic conventions (service.name, service.version). https://opentelemetry.io/docs/specs/semconv/resource/service/
- Google. SRE Workbook (SLOs, alerting, incident response practices). https://sre.google/workbook/table-of-contents/
- Honeycomb. Handling sensitive information with the OpenTelemetry Collector (redaction, hashing patterns). https://docs.honeycomb.io/send-data/opentelemetry/collector/handle-sensitive-information/
Reacties
Nog geen reacties. Wees de eerste om te reageren.
Plaats een reactie

Geschreven door Funs Janssen
Software Consultant
I’m Funs Janssen. I build software and write about the decisions around it—architecture, development practices, AI tooling, and the business impact behind technical choices. This blog is a collection of practical notes from real projects: what scales, what breaks, and what’s usually glossed over in blog-friendly examples.
Inhoud

Leer hoe je correlation IDs inzet in ASP.NET microservices om verzoeken te traceren, debuggen en de prestaties van je systemen te verbeteren.
.png%3F2025-08-20T13%3A39%3A31.637Z&w=3840&q=80)
Learn how per pull request environments boost code quality, reduce release risks, and speed up feedback in modern CI/CD pipelines.
