Open-weights models are forcing a repricing of inference

For most of the AI era so far, the pricing of large-language-model inference has been set by a small number of frontier labs who controlled both the best weights and the deployment substrate. That market structure is dissolving. The cause is not a single competitor — it is the cumulative effect of capable open-weights releases and a rapidly maturing third-party hosting market.

The Llama family from Meta, the Mistral and Mixtral lines, the DeepSeek releases, and the Qwen family from Alibaba have collectively closed most of the practical capability gap on the workloads that account for the majority of enterprise AI spend: summarisation, classification, structured extraction, code completion, retrieval-augmented question answering. The frontier still belongs to the closed labs, but the workloads where the frontier is required are a smaller fraction of total inference than the headlines suggest.

Hosting is now a commodity

The second leg of the change is on the supply side. Inference providers — Together, Fireworks, Groq, Cerebras, the major hyperscalers — now compete on price-per-million-tokens for the same open-weights models. The result is the dynamic that was missing in the closed-model market: genuine multi-vendor commodity pricing for an essentially fungible output. Artificial Analysis and similar benchmarking services publish daily prices that show the spread between providers narrowing month-on-month.

The pricing implication is clear in the numbers. The cost of generating a million tokens from a top-tier open-weights model has fallen by roughly an order of magnitude over the last eighteen months on a like-for-like capability basis. The cost of generating a million tokens from a closed frontier model has fallen far less. That spread now defines the buy-decision for the majority of enterprise workloads: the closed model is justified where it is needed; the open model is the default where it suffices.

For the closed labs the response has been to push hard on capabilities that the open ecosystem cannot easily replicate — long-context reasoning, agentic orchestration, multimodal grounding, and tightly-integrated tool use. Anthropic's extended thinking and computer-use modes and OpenAI's continued investment in agentic primitives are visible expressions of this strategy. The frontier is moving away from raw next-token quality and toward capabilities that are harder to clone in a model card.

For enterprise buyers the strategic posture is now portfolio-shaped. A serious AI architecture in 2026 typically combines a closed frontier model for the workloads that justify it, an open-weights model hosted at commodity prices for the long tail, and increasingly an internally fine-tuned open-weights model where the training data is sensitive or domain-specific. The all-in on a single vendor that characterised many 2023 deployments is rare in current procurement.

The larger question is whether the closed labs can sustain the spread long enough to recoup the capex that the frontier costs. They have argued yes, and their revenue figures support a partial yes. But the gravitational pull of a competitive open ecosystem is the same in AI as it has been in databases, operating systems and web servers. It does not have to win to reshape the market. It only has to be good enough.

Hosting is now a commodity

Related stories

The convergence point for agentic systems is the protocol layer

Stablecoins are quietly becoming the dominant rail for cross-border value transfer

Power, not chips, has become the binding constraint on AI deployment

Post-quantum migration deadlines arrive faster than enterprise readiness