OpenTelemetry Observability for Web & Mobile Apps

Modern product teams ship across web apps, mobile apps, and APIs. Users do not care which layer broke. They just feel “the app is slow” or “checkout failed”.

That’s why one observability standard across every surface is no longer a nice-to-have. It’s a reliability requirement.

If your browser telemetry speaks one language, your iOS/Android telemetry speaks another, and your backend logs are “best effort”, you end up with the same outcome every time: long incident calls, lots of guessing, and unclear ownership.

This article shows a practical route to implement OpenTelemetry Observability Web App and Mobile App end-to-end. We’ll cover traces, metrics, logs, correlation IDs, and how to make them consistent across platforms.

Then we’ll turn that telemetry into outcomes: dashboards, alerts, incident runbooks, and SLAs that actually mean something. You’ll walk away with a rollout plan you can ship in phases, without drowning your team in noisy data or surprise bills.

Introduction

If you run a web app, a mobile app, and a backend API, you already have a distributed system. Even if it’s “just one product”.

And distributed systems fail in distributed ways: a DNS hiccup, a bad mobile release, a slow database query, a flaky third-party API, or a queue backlog that only shows up under peak load.

The problem is not that teams lack data. It’s that they lack consistent data.

OpenTelemetry gives you a vendor-neutral way to ship traces, metrics, and logs with a shared context, using a common protocol (OTLP) and a common set of semantic conventions. That consistency is what enables faster debugging, fewer “can’t reproduce” incidents, and cleaner conversations with stakeholders about uptime, latency, and support.

In other words, OpenTelemetry Observability Web App and Mobile App is not about adding more dashboards. It’s about building a shared reliability story across frontend, mobile, and backend so you can define realistic SLAs, spot regressions faster, and reduce MTTR when something breaks.

Quick Takeaways

Here’s the “do this first” checklist for OpenTelemetry Observability Web App and Mobile App that works for most teams:

Target architecture

Web app + mobile apps + backend services emit telemetry.
Everything exports via OTLP to an OpenTelemetry Collector.
The Collector forwards to your observability backend (Grafana, Datadog, New Relic, Honeycomb, Elastic, Azure Monitor, and so on).

This “apps > Collector > backend” setup makes governance possible without blocking delivery. It also gives you one place to enforce PII scrubbing and cost controls.

Minimum signals to ship first (highest ROI)

Traces (backend first, then frontends)
Logs with trace correlation (so errors jump to traces)
Add metrics once you know what you want to alert on

Dashboards/alerts that pay off fastest

API: request rate, error rate, latency (p95/p99), dependency latency
Mobile: crash-free sessions, network error rate, cold start time
Web: navigation/load performance, frontend error rate, backend correlation rate

If you do nothing else, do this: make every incident answerable with one trace ID path and one consistent set of service/environment attributes.

Build a consistent observability model (the “contract”)

Before you instrument everything, define an observability contract.

Think of it like an API schema, but for telemetry: naming, attributes, context propagation, and correlation rules. Without that, your data will technically exist, but it won’t be usable.

This is the core of OpenTelemetry Observability Web App and Mobile App: every layer tells the same story, using the same identifiers.

Required conventions: service/resource attributes & environments

You want dashboards that work across services and environments, without being rebuilt every time you add a new component.

Start by standardizing a short list of resource attributes, based on OpenTelemetry semantic conventions:

service.name: stable logical service identifier (required) (opentelemetry.io)
service.version: release version (recommended) (opentelemetry.io)
deployment.environment (or the newer naming your org standardizes on): preview, staging, prod (opentelemetry.io)
Optional but useful:
- service.namespace (team or product grouping) (opentelemetry.io) - cloud.region, k8s.cluster.name, etc (if you run cloud/Kubernetes)

For mobile, add a few device/app attributes consistently:

app.platform: ios / android
app.version: store version (not build number only)
device.model, os.version (careful: these can increase cardinality)
network.type: wifi/cellular (if available)

Keep the list short. High-cardinality attributes (like full device IDs, emails, or free-form strings) will either explode costs or get blocked by your backend.

A simple naming rule that prevents weeks of pain later:

Service names are stable (do not include environment in service.name)
Environments are attributes (deployment.environment)
Versions are attributes (service.version)

That way, “API latency in prod vs staging” is a filter, not a different dashboard.

Correlation IDs + W3C Trace Context end-to-end

Distributed tracing correlation lives or dies on propagation.

OpenTelemetry uses the W3C Trace Context standard: traceparent and tracestate. (w3.org) This is what lets web, mobile, gateways, and services join the same trace.

Here’s the practical strategy most teams should use:

Use W3C trace context for system-to-system correlation (automatic where possible).
Keep a separate, user-facing correlation ID for support and UX workflows.

Why keep a correlation ID if you already have traceparent

You might not sample every trace.
You might not want to expose trace IDs to customers.
You want one ID that appears in:
- app UI error screens
- support tickets
- backend logs
- incident chat threads

Your propagation checklist:

Browser app injects trace context into API requests (where supported).
Mobile networking layer injects trace context into API requests.
API gateway preserves (or restarts) trace context intentionally at trust boundaries.
Backend services propagate context across:
- HTTP calls
- message queues
- background jobs

Logs and metrics should link back to traces via:

trace_id / span_id fields in logs
exemplar support in metrics if your backend supports it (optional)

If you want a concrete ASP.NET approach for correlation beyond tracing headers, see my guide on Correlation IDs in ASP.NET for end-to-end tracing.

Instrumentation blueprint

Instrumentation is where most teams either:

over-instrument and drown in noise, or
under-instrument and still can’t debug incidents

A reliable plan for OpenTelemetry Observability Web App and Mobile App is: start small, expand safely.

Ship the minimum “spine” first:

backend request traces
correlated logs
frontend/mobile network traces into the backend

Then expand to:

DB spans
messaging spans
mobile performance signals
browser navigation timing and errors
custom business spans (only when you know what questions you need to answer)

Web app (browser/RUM) + backend traces

Browser instrumentation is powerful, but it has real constraints.

OpenTelemetry’s browser client instrumentation is still described as experimental and mostly unspecified, so treat it as an evolving surface. (opentelemetry.io) That doesn’t mean “don’t use it”. It means “be deliberate”.

What to capture first in the web app:

document load / navigation spans (initial page load)
fetch/XHR spans for API calls
frontend errors (uncaught exceptions, failed resource loads)

Your #1 goal:

join frontend work to backend traces for the same user action

That requires:

injecting traceparent headers on API calls (when allowed)
ensuring the backend accepts and continues the trace context

Backend traces: what to instrument first

HTTP server instrumentation (ASP.NET Core, Node, etc.)
HTTP client instrumentation (outbound calls)
database instrumentation (SQL client, ORM)
messaging instrumentation (queues, event streams)

If you’re on .NET, OpenTelemetry zero-code (automatic) instrumentation is a great first step to get baseline traces and metrics without touching application code. (opentelemetry.io) Use it to prove value quickly, then add manual spans where it matters (checkout, login, search, sync jobs).

Noise control tips for web + backend:

Avoid capturing full URLs with query strings.
Do not create spans for every tiny UI event.
Use sampling early, even if it’s a simple head-based percentage.

Mobile app essentials

Mobile observability has different “golden signals” than backend services. Users feel:

slow startup
janky scrolling
network errors
crashes

For opentelemetry mobile observability iOS android, focus on signals that explain those outcomes:

Cold start time (and warm start if relevant)
Network request latency and errors
Crashes and ANRs (Android “Application Not Responding”)
Slow frames / rendering issues
Session-level context (session ID, app version, device class)

OpenTelemetry Android provides a client-apps SDK with a configuration DSL, including HTTP export to an OTLP endpoint and session settings. (opentelemetry.io) That makes it a solid base for consistent OTLP collector architecture for web and mobile.

Mobile-specific implementation details that matter in real life:

Session IDs: generate a random session identifier and attach it as an attribute. Do not use advertising IDs or emails.
Offline buffering: mobile networks drop. Your SDK should buffer and retry within sane limits.
On-device redaction: scrub obvious sensitive fields before export whenever possible.
Export via OTLP: ideally to your Collector endpoint (not directly to the vendor).

A simple, non-PII correlation pattern:

session.id (random UUID)
user.state (anonymous/authenticated, not the user ID)
install.channel (app store, enterprise, testflight)
release.version (same semantic version you show the user)

Then ensure your backend logs include:

correlation ID (support-friendly)
trace ID (debug-friendly)
environment + version (release-friendly)

That’s how RUM + distributed tracing end-to-end correlation becomes useful instead of theoretical.

Collector governance: sampling, PII scrubbing, and cost controls

The OpenTelemetry Collector is not just plumbing.

It’s your governance layer. It’s where you enforce:

what data is allowed to leave your boundary
what gets dropped for cost control
how attributes are standardized so dashboards work

OpenTelemetry explicitly positions the Collector as a place to transform and govern telemetry for data quality, cost, and security. (opentelemetry.io)

Sampling strategy (head vs tail) that matches your risk & budget

Sampling is not a “later” problem. If you wait, you’ll either:

get surprised by bills, or
panic-sample during an incident and lose the data you needed

Two common approaches:

Head-based sampling

Decision is made at the start of the trace (SDK side).
Easy to implement, predictable cost.
Best for: early-stage teams, high traffic, “good enough” debugging.

Tail-based sampling

Decision is made after seeing the whole trace (Collector side).
Lets you keep traces that are:
errors
high latency
specific endpoints or customers (careful with PII)
Best for: scale-ups, incident-heavy systems, high-value transactions.

OpenTelemetry’s own guidance for tail sampling uses the tail sampling processor in the Collector and shows policies like “keep errors” plus a probabilistic remainder. (opentelemetry.io) A practical rollout:

Start with head-based sampling (5% to 20% depending on traffic).
Add tail-based rules for:

status_code == ERROR
latency over a threshold (p95 tail)
critical routes like /checkout or /login

Keep 100% of traces in preview/staging if your volume allows, because that’s where you want maximum debug signal.

Key interaction with incident investigations:

If you only sample randomly, you might miss the one failing trace.
Tail sampling reduces that risk by preserving “interesting” traces.
But tail sampling costs Collector memory and CPU, so size it intentionally.

PII and sensitive data handling (scrub before it leaves your boundary)

If you take one thing seriously in OpenTelemetry Observability Web App and Mobile App, make it this:

Do not ship secrets or PII in telemetry.

The W3C Trace Context spec is explicit that vendors must not include personally identifiable information in tracestate. (w3.org) That’s your baseline mindset for all telemetry attributes too.

What not to capture:

auth tokens (Bearer tokens, session cookies, refresh tokens)
emails, phone numbers, addresses
full URLs with query params (often contain PII)
request/response bodies (almost always contain sensitive data)
raw user IDs if they are directly identifiable

Preferred patterns:

Allowlists over denylists for high-risk attributes.
Hashing for stable correlation without revealing the value.
Masking for partial visibility (last 4 digits), if truly required.

Collector-side enforcement:

Use Collector processors to delete, redact, or transform attributes. OpenTelemetry documents transforming telemetry with processors like filter, attributes, and transform. (opentelemetry.io)
Many vendors also provide guidance on using Collector processors to handle sensitive information (delete, redact, hash). (docs.honeycomb.io)

A good governance workflow:

Define a “telemetry data policy” (one pager).
Add automated checks in code review for logging/telemetry changes.
Enforce redaction rules in the Collector so mistakes don’t leak.

That’s how you scale observability across teams without relying on everyone being perfect.

Operationalize: dashboards, alerts, runbooks, and SLAs

Telemetry is only valuable if it changes operations.

The goal is not “more graphs”. The goal is:

faster detection
faster diagnosis
faster mitigation
and clear reporting against SLAs

This is where dashboards and alerts for SLO SLI OpenTelemetry matters.

Dashboards & alert set that supports SLOs

Dashboards should be built around what users experience, not what’s easy to chart.

Start with the classic “golden signals” mindset:

latency
traffic
errors
saturation

Then adapt per surface.

API (backend) dashboards

Request rate by route
Error rate (4xx vs 5xx)
Latency p50/p95/p99 by route
Dependency latency (DB, cache, third-party APIs)
Queue backlog / consumer lag (if async exists)

Web app dashboards

Navigation timing (initial load, route changes)
Frontend error rate (JS exceptions)
API error rate as seen by the browser
Correlation coverage: % of frontend requests that successfully join backend traces

Mobile dashboards

Crash-free sessions (by app version)
Cold start time (p50/p95)
Network error rate and latency (by endpoint, by network type)
Slow frames / rendering problems (by device class, by app version)

Alerting principles:

Page on user-visible symptoms, not “something changed”.
Avoid alerts that don’t have a clear action.
Prefer SLO-based alerts where possible.

Google’s SRE material strongly emphasizes alerting tied to SLOs and incident response practices, not random thresholds. (sre.google) Even if you don’t implement full error budgets on day one, align alerts to:

“Checkout failure rate exceeded X for Y minutes”
“p95 latency exceeded X for Y minutes”
“Crash-free sessions dropped below X% after release”

Also, do not forget uptime and dependency basics. Pair tracing with health endpoints and synthetic checks. If you’re in .NET, my guide on .NET API health checks for dependency and uptime signals is a practical complement to your OpenTelemetry dashboards.

Incident runbooks + CI/CD integration (Azure DevOps & per-PR envs)

Observability gets real when your alerts link to actions.

A simple incident runbook template that works well:

Symptom: what the user sees (and the SLO impacted)
Scope: which environment, which version, which region
First queries:

- dashboard link
- trace search (error traces, slow traces)
- log query with correlation ID

Mitigation options:

- rollback
- feature flag off
- scale up/down
- disable a dependency path

Validation:

- confirm metrics recovered
- confirm error budget burn stopped

Post-incident:

- root cause summary
- action items (owner + date)

Two CI/CD practices make OpenTelemetry Observability Web App and Mobile App dramatically easier:

1) Release annotations

Add deployment markers (version, commit hash, environment) so graphs show “what changed”.
Standardize service.version on every deployment so you can filter and compare. (opentelemetry.io) If you run Azure DevOps, build this into your pipelines so it’s automatic and consistent. My article on Azure DevOps CI/CD foundations for consistent release + ops workflows lays out a lean structure small teams can maintain.

2) Per pull request environments

Preview environments let you validate:
trace propagation
sampling rules
PII scrubbing
dashboards and alerts (at least smoke tests)
before the change hits staging or prod

That is especially useful for instrumentation work, because observability changes are easy to ship “working” but still wrong. Here’s my guide on per pull request environments to validate telemetry + releases safely.

Bring it all together with SLAs and maintenance:

Your SLA is only defendable if you can measure the SLI.
Your maintenance contract is easier to justify when you can report:
uptime
latency
incident counts
MTTR
top regressions by release

Observability is how you turn “we think it’s stable” into a measurable, auditable reliability story.

Conclusion

A consistent rollout of OpenTelemetry Observability Web App and Mobile App is not about boiling the ocean.

It’s a phased system you can ship safely:

Contract first: standardize naming (service.name, service.version), environments, and trace context propagation.
Instrumentation next: start with backend traces and correlated logs, then connect web and mobile network spans into the same trace story.
Governance via Collector: add sampling, attribute normalization, and PII scrubbing before telemetry becomes unmanageable.
Operationalize: build dashboards and alerts tied to SLOs, then write runbooks that link symptoms to traces, logs, and mitigations.

When you do this well, the wins show up fast:

fewer regressions escaping into production
faster incident triage because every team is looking at the same “source of truth”
clearer SLA reporting because your measurements are consistent across platforms
better product decisions because you can see performance and reliability per release

If you want help building an observability baseline that fits your stack, FJAN IT can implement the contract, set up the Collector, instrument your apps, and deliver dashboards, alerts, and runbooks that your team actually uses. That’s the difference between “we have telemetry” and “we run reliable software”.

FAQs

Feedback

What stack are you running right now (.NET/Node, Azure/AWS, React/Next.js, native iOS/Android, Flutter, React Native) And what’s your biggest reliability pain point today: slow incidents, noisy alerts, missing mobile visibility, or unclear SLAs

If you want, share a few details and I’ll suggest an OpenTelemetry Observability Web App and Mobile App baseline rollout. FJAN IT also offers an observability baseline review where we map your current telemetry to a concrete instrumentation + Collector + dashboard plan.

References

W3C. Trace Context (traceparent, tracestate) Specification. https://www.w3.org/TR/trace-context/
OpenTelemetry. Browser Getting Started (JS) and limitations. https://opentelemetry.io/docs/languages/js/getting-started/browser/
OpenTelemetry. .NET zero-code (automatic) instrumentation. https://opentelemetry.io/docs/zero-code/dotnet/
OpenTelemetry. Android client apps instrumentation and OTLP export configuration. https://opentelemetry.io/docs/platforms/client-apps/android/
OpenTelemetry. Transforming telemetry in the Collector (processors, filtering, attributes, transform). https://opentelemetry.io/docs/collector/transforming-telemetry/
OpenTelemetry Blog. Tail sampling with OpenTelemetry Collector (policies and guidance). https://opentelemetry.io/blog/2022/tail-sampling/
OpenTelemetry. Service and resource semantic conventions (service.name, service.version). https://opentelemetry.io/docs/specs/semconv/resource/service/
Google. SRE Workbook (SLOs, alerting, incident response practices). https://sre.google/workbook/table-of-contents/
Honeycomb. Handling sensitive information with the OpenTelemetry Collector (redaction, hashing patterns). https://docs.honeycomb.io/send-data/opentelemetry/collector/handle-sensitive-information/

OpenTelemetry Observability Web App and Mobile App

Introduction

Quick Takeaways

Target architecture

Minimum signals to ship first (highest ROI)

Dashboards/alerts that pay off fastest

Build a consistent observability model (the “contract”)

Required conventions: service/resource attributes & environments

Correlation IDs + W3C Trace Context end-to-end

Instrumentation blueprint

Web app (browser/RUM) + backend traces

Mobile app essentials

Collector governance: sampling, PII scrubbing, and cost controls

Sampling strategy (head vs tail) that matches your risk & budget

PII and sensitive data handling (scrub before it leaves your boundary)

Operationalize: dashboards, alerts, runbooks, and SLAs

Dashboards & alert set that supports SLOs

Incident runbooks + CI/CD integration (Azure DevOps & per-PR envs)

Conclusion

FAQs

Do I need OpenTelemetry Collector or can I export directly from apps?

How do I correlate mobile sessions with backend traces without storing PII?

What sampling rate should I start with for a startup vs scale-up?

How do I ensure correlation still works with per-environment deployments (preview/staging/prod)?

What’s the minimum dashboard set to support an SLA?

Feedback

References

Reacties

Plaats een reactie

Inhoud

Hoe gebruik je Correlation Ids in ASP NET microservices

Per Pull Request Environment: Boost CI/CD for DevOps & Teams