FJAN Logo

Ollama on-premise setup for businesses: offline AI

Publicatiedatum

Leestijd

10 minuten

Delen

Offline Secure AI for Business
Profile photo of Funs Janssen

Geschreven door Funs Janssen

Software Consultant

I’m Funs Janssen. I build software and write about the decisions around it—architecture, development practices, AI tooling, and the business impact behind technical choices. This blog is a collection of practical notes from real projects: what scales, what breaks, and what’s usually glossed over in blog-friendly examples.

Running LLMs in-house is having a moment, and not just because it is cool.

For many organizations, the driver is data control: you want the productivity boost of an internal AI assistant, but you do not want prompts, code, tickets, or documents leaving your network. You also want something your security team can actually sign off on: no public exposure, least privilege everywhere, and auditable usage.

This guide is a practical, step-by-step walkthrough for an Ollama on-premise setup for businesses (Open WebUI + Azure DevOps offline AI). We’ll cover how to deploy Ollama on an internal server, add Open WebUI as a secure internal chat interface, and connect an Azure DevOps offline workflow using ClearSpecs AI’s offline version via an OpenAI-compatible endpoint. We’ll also dig into the real stuff that makes or breaks production rollouts: sizing GPU/CPU, network isolation (including air-gapped patterns), SSO/VPN access control, model governance, backup and update strategy, and common pitfalls like Docker networking, CORS/origin settings, and auditability gaps.

If you want private, offline-capable AI that behaves like enterprise software (not a lab demo), you’re in the right place.

Quick Takeaways

  • A minimum viable secure Ollama on-premise setup for businesses (Open WebUI + Azure DevOps offline AI) is: private network/VPN, reverse proxy + HTTPS, SSO (OIDC) or trusted-header auth, persistent volumes, explicit offline mode, and tested backups.
  • Pick single-host when you want the fastest path and can tolerate a single patch domain.
  • Pick a VM/cluster when you need clean separation (UI vs. model runtime), GPU pools, or easier maintenance windows.
  • GPU VRAM is the limiter. If VRAM is tight, concurrency suffers first.
  • “Offline” is not one switch. You need network egress controls plus app-level offline behavior. Open WebUI’s offline mode helps, but it is not a firewall. (docs.openwebui.com)
  • Before rollout, test: restore from backup, SSO redirects behind proxy, WebUI to Ollama connectivity in Docker, CORS origins, and log retention.

Reference architecture + sizing for an offline LLM stack

A solid Ollama self-hosted enterprise deployment starts with a simple idea: keep every hop inside your control, and make the “happy path” the secure one.

Here’s the architecture diagram in words:

  • Users connect via VPN or internal network
  • SSO + reverse proxy (Nginx/Traefik/IIS ARR) enforces HTTPS, OIDC, IP allowlists
  • Open WebUI (internal chat UI) talks only to internal model endpoints
  • Ollama (model runtime) serves models to WebUI and internal tooling
  • Optional: OpenAI-compatible gateway endpoint (for tooling like ClearSpecs AI offline) provides /v1/chat/completions style API
  • Logs/metrics: centralized collection for auditability
  • Backups: config + volumes + “model registry” artifacts

If you’re setting policy in parallel, bookmark your governance basics early: approved model list, prompt/tool restrictions, data classification, and retention. That’s the difference between “a chatbot” and a controlled internal system. This is where an internal guide like ethical AI governance and oversight helps you define what “allowed” means before users invent their own rules.

Single host vs. VM/cluster

Single host appliance (fastest path):

  • WebUI + Ollama on one server (Docker Compose or system services)
  • Fewer moving parts, fewer networks to debug
  • Patch domain is shared, so maintenance affects everything at once

VM/cluster pattern (cleaner boundaries):

  • Separate VM(s): WebUI nodes, Ollama GPU node(s), optional gateway node
  • Easier patching and scaling, cleaner firewall rules between tiers
  • Storage needs more thought (models, WebUI data, backups)

Sizing guidance

For private LLM GPU sizing for small business, think in constraints:

  • GPU VRAM: usually the hard limit for which model class and quantization you can run at acceptable speed.
  • CPU-only: acceptable for low concurrency or smaller models, but expect slower responses and poor peak-hour experience.
  • RAM: leave headroom for containers, caching, and OS. “It runs” is not the same as “it stays stable under load.”
  • SSD/NVMe storage: plan for model files + versioning. Models add up fast when you keep multiple “known good” versions.
  • LAN performance: low latency matters. Keep WebUI and Ollama close inside the same LAN segment when possible.

Isolation approaches (real offline)

You typically see three tiers:

  1. Fully air-gapped (no egress) - Best for strict environments, but updates become a controlled import process.
  2. Egress-denied (proxy allowlist) - Default-deny outbound, explicitly allow only what is required for patching.
  3. Offline mode at the app level - Useful, but it does not replace network controls. Open WebUI’s offline mode is designed to reduce failures without internet, not to enforce isolation by itself. (docs.openwebui.com)

Access control options

A secure baseline:

  • VPN-only access, internal DNS, no public exposure
  • Reverse proxy with OIDC/SSO (Open WebUI supports SSO patterns, including proxy-delegated flows) (docs.openwebui.com)
  • Disable public signup in WebUI, keep admin bootstrap controlled
  • Tight service-to-service boundaries (WebUI can reach Ollama, users cannot)

For auditability, plan for centralized observability and audit-friendly telemetry using something like centralized observability and audit-friendly telemetry so you can answer: who used it, when, and for what category of work.

Step-by-step: deploy Ollama + Open WebUI + Azure DevOps offline

This section is the “do it for real” part of the Ollama on-premise setup for businesses (Open WebUI + Azure DevOps offline AI): provisioning, Docker/Compose, secure binding, then Azure DevOps integration via ClearSpecs AI offline.

Step 1: Server prep

  1. Pick a baseline OS you can patch consistently (Linux LTS is common).
  2. Create a dedicated internal hostname, for example ai-chat.infra.local.
  3. Set host firewall to default deny inbound, then open only what you must (typically 443 to the reverse proxy).
  4. Decide your trust boundary:
  • Preferred: Ollama reachable only from the WebUI network, not directly from user subnets. Unless you are going to use Clearspecs AI.
  • Keep management access (SSH/RDP) limited to admin networks.

Step 2: Ollama install (service/container)

Run Ollama with:

  • Persistent model volume (so reboots do not wipe models)
  • A safe bind address strategy

Two key notes:

  • By default, you want Ollama bind address localhost security. Only open it up if you have a strong reason and network controls.
  • Ollama exposes its bind config via OLLAMA_HOST (for example 0.0.0.0:11434 when you intentionally want network access). (docs.ollama.com)

Verification:

  • Run a local health check from the same host.
  • Pull models in a controlled way, and document versions/hashes as your on-prem LLM governance and model registry starter.

Step 3: Open WebUI install (Docker Compose)

This is where most “open webui ollama docker compose example” setups go wrong: Docker networking and base URLs.

Recommended actions:

  • Deploy Open WebUI with Docker, using persistent storage for its data directory. (github.com)
  • Set OLLAMA_BASE_URL to the Docker-reachable Ollama address (service name on the Compose network, not localhost).
  • Disable signup, set a controlled admin bootstrap, and verify session security.

Also, check Open WebUI’s environment configuration reference for origin and deployment variables. (docs.openwebui.com)

Step 4: Secure access path (reverse proxy + SSO)

Put Open WebUI behind a reverse proxy:

  • Enforce HTTPS
  • Prefer OIDC SSO (Entra ID is common) or a trusted-header approach when you already have an authenticating proxy (docs.openwebui.com)
  • Validate redirect/callback URLs carefully (test with your actual internal hostname)
  • Add IP allowlists where practical (especially admin routes)

Security warning: publishing ports via Docker can bypass “obvious firewall rules” in surprising ways, because Docker manipulates iptables. Binding published ports to localhost is a common mitigation. (cheatsheetseries.owasp.org)

Step 5: Offline/air-gapped hardening

For an air-gapped LLM deployment guide, treat offline as a checklist:

  • Enable Open WebUI offline mode (to reduce errors when internet is unavailable). (docs.openwebui.com)
  • Confirm no external calls at the network layer (packet capture or proxy logs).
  • Pre-load required model files and any needed UI assets during your controlled update window.
  • Document the import process so “offline” remains true after upgrades.

Step 6: Clearspecs AI offline workflow

To make the Clearspecs AI for Azure DevOps extension work without cloud AI, you need an internal OpenAI-compatible endpoint on-prem.

ClearSpecs AI’s offline configuration supports a locally reachable, OpenAI-compatible base URL and explicitly notes:

  • requests go only to the URL you configure (subject to your network rules)
  • the endpoint must provide CORS headers allowing your Azure DevOps web origin (docs.clearspecs.ai)

Implementation flow:

  1. Stand up your internal OpenAI-compatible endpoint (often a small gateway service in the same VLAN).
  2. Configure ClearSpecs AI offline to point to that base URL.
  3. Start with a non-sensitive project and run tests end-to-end: work items, repos, PR descriptions, and spec generation.

If you want the concrete ClearSpecs steps, follow the internal guide to install ClearSpecs AI on Azure DevOps Server. For broader tooling context, see the DevOps tools overview (ClearSpecs AI and more).

Common pitfalls + fixes

1) “WebUI can’t see Ollama” (Docker networking)

  • Symptom: Open WebUI points at http://localhost:11434 and fails.
  • Fix: use the Docker service name or host.docker.internal (depending on your topology). Never assume localhost inside a container means the host.

2) Incorrect OLLAMA_BASE_URL

  • Fix: set it to the address WebUI can route to, and verify with curl from inside the WebUI container.

3) CORS/origin issues behind reverse proxy

  • Symptom: “not an accepted origin” errors.
  • Fix: add your internal URLs to CORS_ALLOW_ORIGIN as documented, including every hostname users will hit (VPN DNS, short name, FQDN). (docs.openwebui.com)

4) OAuth/SSO breaks due to missing proxy headers

  • Fix: ensure proxy forwards expected headers (X-Forwarded-Proto, X-Forwarded-Host) and review Open WebUI SSO troubleshooting guidance. (docs.openwebui.com)

5) Auditability gaps

  • Symptom: no central logs, no retention policy, no traceability.
  • Fix: log at reverse proxy, WebUI app, and gateway. Define retention and access reviews. Without this, compliance sign-off is fragile.

Ops checklist (use this before production)

  • Backups
    • WebUI persistent volume + config
    • Ollama models path + config
    • Gateway configs (if used)
  • Restore test
    • Do a full restore into a test VM and validate logins + model availability
  • Monitoring
    • GPU/CPU/RAM saturation, disk usage, error rate, response latency
  • Patching cadence
    • Monthly security updates, plus urgent CVE response workflow
  • Model promotion
    • dev > staging > prod, with a documented “approved model list”
  • Rollout plan
    • pilot group, training, acceptable use, incident response playbook

Conclusion

A well-built Ollama on-premise setup for businesses (Open WebUI + Azure DevOps offline AI) is less about “installing a model” and more about treating internal AI like any other production platform: secure access, controlled networks, clear governance, and operational discipline.

If you want the shortest secure-by-default path, start with a single host deployment: Ollama + Open WebUI, reachable only through VPN and a reverse proxy enforcing HTTPS and SSO. Keep Ollama bound to localhost or a private interface, use persistent volumes, and verify that Docker port publishing is not accidentally exposing services. (cheatsheetseries.owasp.org) If you need stronger separation, easier patch windows, or GPU pools, move to a VM/cluster pattern with explicit network segments: UI tier, model runtime tier, optional OpenAI-compatible gateway, and centralized logs. That structure makes it easier to prove compliance, scale safely, and keep your “offline” claims true when the environment evolves.

Your next step should be a checklist-driven pilot: pick a small user group, define the approved model list, test CORS and SSO behind the proxy, and run a full restore test. Once that is solid, you can roll it out broadly with confidence instead of hoping nobody clicks the wrong link or pastes the wrong secret.

FAQs

Feedback

If you’re planning an Ollama on-premise setup for businesses (Open WebUI + Azure DevOps offline AI), what’s your biggest blocker right now: GPU sizing, SSO, auditability, or true offline operations?

If you want to see the Azure DevOps workflow in action, request the ClearSpecs AI offline demo and test it against a safe internal pilot. And if you’d like hands-on help, FJAN IT can implement secure internal AI automations, dashboards around usage/audit logs, and a rollout plan your security team can approve.

References

Reacties

Nog geen reacties. Wees de eerste om te reageren.

Plaats een reactie

Profile photo of Funs Janssen

Geschreven door Funs Janssen

Software Consultant

I’m Funs Janssen. I build software and write about the decisions around it—architecture, development practices, AI tooling, and the business impact behind technical choices. This blog is a collection of practical notes from real projects: what scales, what breaks, and what’s usually glossed over in blog-friendly examples.

Inhoud