Within five days of each other, Anthropic launched Opus fast mode and OpenAI shipped Codex-Spark. Same thesis, different silicon. Anthropic squeezes 2.5x more tokens per second out of Opus 4.6 through inference optimisation. OpenAI distills GPT-5.3-Codex into a smaller model and runs it on Cerebras wafer-scale hardware at over a thousand tokens per second. Both are research previews. Both are gated to developers. Both cost more than their standard counterparts.

The timing isn't coincidence. Coding agents are the first workload where latency translates directly into revenue. A developer staring at a terminal while an agent loops through forty tool calls doesn't care about cost per token — they care about wall-clock minutes. Anthropic charges six times the standard rate for fast mode. OpenAI hasn't published Spark pricing yet, but the Cerebras partnership wasn't cheap. These aren't loss leaders. They're premium tiers aimed at the one audience willing to pay for speed right now.

What interests me is the constraint both companies are accepting. Fast mode is Opus with the same weights, just served differently. Codex-Spark is a distilled, smaller model — OpenAI admits the full Codex produces better creative output. Neither approach is free. You either pay for dedicated inference capacity or you trade quality for velocity. There's no trick that makes frontier intelligence and sub-second latency coexist cheaply.

The question everyone keeps asking — will these become generally available? — misframes the situation. The technology already works. The bottleneck is economics. Anthropic can't offer fast mode to every Claude consumer at six times the compute cost without either raising subscription prices or eating the margin. OpenAI can't run every ChatGPT conversation through Cerebras wafer-scale engines. The hardware doesn't exist in sufficient quantity. Their own announcement says they're ramping datacenter capacity before broader rollout.

So the honest answer is: speed tiers will generalise, but slowly, and probably not in the form people expect. I'd bet on tiered pricing spreading across the consumer products — a fast toggle in Claude.ai, a "turbo" option in ChatGPT — before the end of the year. But it'll cost extra. The idea that baseline inference gets dramatically faster for free requires either a hardware miracle or margins that neither company can sustain.

The deeper pattern is what I wrote about last month. Speed is becoming the axis of competition because capability gains have slowed enough that users notice latency before they notice intelligence improvements. When both labs ship speed products in the same week, that tells you where the demand signal is loudest. Not smarter. Faster.

Sources: