Send a prompt to OpenAI or Anthropic and you have already lost control of it. For a hospital, a law firm, a defence contractor, or a financial institution, that is not a theoretical risk — it is a compliance failure. GDPR, HIPAA, data sovereignty requirements, and internal security policies all point to the same conclusion: the data cannot leave your network.
The standard answer is to build your own AI infrastructure. Spin up GPU servers, deploy vLLM or something similar, manage CUDA dependencies, handle scaling, maintain the stack. It works, but it takes months and a dedicated team to do properly.
There is a faster path.
What “Air-Gapped AI” Actually Means in Practice
Air-gapped deployment means the model runs entirely on infrastructure you control, with no outbound connections to external APIs or cloud providers. Prompts, completions, and any data passed through the model stay inside your network perimeter.
For regulated industries, this matters for several distinct reasons:
-
GDPR: Personal data processed by a third-party cloud provider triggers data processing agreements, cross-border transfer restrictions, and breach notification obligations. Local inference sidesteps all of this.
-
HIPAA: Protected health information cannot be sent to external servers without a Business Associate Agreement. Running inference on-premise eliminates that requirement entirely.
-
Data sovereignty: UK, EU, and sector-specific regulations increasingly require that sensitive data be processed within defined geographic or organisational boundaries.
-
IP protection: Your prompts often contain proprietary information. Air-gapped deployment means that information never reaches an external server where it could be logged, used for training, or subpoenaed.
The compliance case is straightforward. The infrastructure challenge is where most organisations get stuck.
Why Building It Yourself Is Harder Than It Looks
The self-hosted AI infrastructure market has matured. Tools like vLLM offer high-throughput inference through PagedAttention, and LocalAI provides broad OpenAI-compatible model support with 44,000 GitHub stars. Both are capable.
But both require substantial DevOps work to run in production. vLLM is known for CUDA out-of-memory errors under load and a steep learning curve. LocalAI puts model deployment, GPU allocation, and scaling entirely on your team. There is no automatic fallback if local resources run short. When something breaks at 2am, your engineers own it.
For a security or compliance team trying to get AI deployed inside a regulated environment, this creates a real dilemma. You need air-gapped deployment, but you may not have the infrastructure engineering capacity to build and maintain a production-grade inference stack from scratch.
How Locai Handles Air-Gapped Enterprise Deployment
Locai routes model inference to end-user devices rather than cloud APIs. For enterprise deployments, that means the model runs on hardware inside your network, prompts never leave your infrastructure, and you get managed orchestration without building the stack yourself.
The key architectural points:
Data never leaves the device. When inference runs locally, the prompt is processed on the machine running the model. No outbound API call, no cloud round-trip, no external server receiving your data. GDPR and HIPAA compliance follows from the architecture, not from configuration.
Automatic fallback when needed. If a device lacks sufficient compute, Locai routes the request to cloud infrastructure automatically. For fully air-gapped deployments where cloud fallback is not permitted, the system runs in on-premise mode with no external dependencies.
OpenAI-compatible API. If your team is already calling OpenAI or Anthropic APIs internally, migration is a single line of code change. No rewriting the application layer, no retraining developers on a new interface.
Model Registry. Models are stored in the cloud or on a local registry and deployed to devices. Once deployed, the model runs locally with unlimited local storage. The registry handles distribution; the device handles inference.
Who This Is Built For
Locai’s enterprise use case targets three types of organisations specifically:
Healthcare providers and health tech companies processing patient data under HIPAA or NHS data governance requirements. The model runs on devices inside the clinical network. Patient data does not leave.
Financial services and fintech subject to FCA, PRA, or sector-specific data handling rules. Inference on internal hardware means transaction data, customer records, and financial models stay inside the compliance perimeter.
Legal, defence, and government organisations where data sovereignty is non-negotiable. Air-gapped deployment with no external dependencies means the system operates independently of cloud uptime and outside network access.
If your CISO has blocked cloud AI adoption, or your compliance team cannot sign off on sending data to a third-party API, this architecture is designed for exactly that constraint.
Pricing for Enterprise Deployments
Locai offers three tiers. The Starter plan at £35 per month covers 15 end nodes, 50GB model registry, and 100GB egress, with a 30-day free trial and no credit card required. Pay-as-you-go rates are £5 per device per month, £0.05 per GB per month for registry storage, and £0.14 per GB for data egress.
Enterprise pricing is custom. For organisations with specific air-gapped requirements, on-premise infrastructure, or large node counts, the enterprise tier covers those needs with custom terms.
The pricing model is flat-rate infrastructure rather than per-token billing. Your inference costs do not scale with usage volume — which matters when you are running AI across a large user base or processing high query volumes internally.
What Cloud-Only Alternatives Cannot Offer
Together AI, Fireworks AI, and RunPod are capable cloud inference providers. None of them are air-gapped options. Data leaves your infrastructure on every request. For regulated industries, that is not a configuration problem you can work around — it is a fundamental architectural mismatch.
Ollama reached 52 million monthly downloads in Q1 2026, which confirms that demand for local inference is real and growing. But Ollama is a developer tool for running models on a single machine. No cloud fallback, no infrastructure orchestration, no enterprise management layer. It is not designed for deploying AI across an organisation.
Locai sits in the gap between those two camps: managed infrastructure that runs on your hardware, with the orchestration and fallback logic handled for you.
Getting Started
If your organisation needs air-gapped AI deployment and you want to avoid building the inference stack from scratch, the fastest path is to start with the free Developer tier or the Starter trial. Documentation is at docs.locai.co.uk. The Discord community is active for technical questions.
For enterprise deployments with specific compliance requirements, the enterprise tier handles custom infrastructure needs. Reach the team at locai.co.uk to discuss your setup.
The compliance requirement does not have to mean months of internal engineering work.
FAQs
What does air-gapped AI deployment mean for enterprise?
Air-gapped AI deployment means the model runs entirely on infrastructure you control, with no data sent to external cloud APIs. Prompts, completions, and any processed data stay within your network perimeter. This satisfies GDPR, HIPAA, and data sovereignty requirements by design.
Can Locai run in a fully offline environment with no internet access?
Yes. For regulated deployments where cloud fallback is not permitted, Locai operates in on-premise mode with no external dependencies. The model runs locally on your hardware without any outbound connections.
How long does it take to deploy an air-gapped AI model with Locai?
A basic deployment using the CLI takes minutes. Migrating an existing OpenAI API integration requires a single line of code change.
Does Locai satisfy HIPAA compliance requirements for healthcare AI?
Because inference runs on devices inside your network and data never reaches an external server, Locai’s architecture avoids the data transmission that triggers most HIPAA obligations. You should confirm specific requirements with your legal and compliance team, but the architecture is designed to keep protected health information on-premise.
What happens if a device does not have enough compute to run the model?
Locai automatically routes the request to cloud infrastructure when a device lacks sufficient compute. For fully air-gapped deployments, this fallback can be disabled so all inference stays on-premise.
How does Locai compare to vLLM or LocalAI for enterprise air-gapped deployments?
vLLM and LocalAI require your team to manage GPU allocation, model deployment, and scaling. There is no built-in cloud fallback or infrastructure orchestration. Locai provides managed hybrid routing with automatic fallback and a simpler deployment model, which reduces the DevOps burden for teams without dedicated AI infrastructure engineers.
What is the pricing for enterprise air-gapped AI deployment with Locai?
The Starter plan is £35 per month with a 30-day free trial. Pay-as-you-go rates are £5 per device per month. Enterprise pricing is custom and covers organisations with specific on-premise or air-gapped requirements. Details are available at locai.co.uk.

