Opus 4.6 Gets a Fast Lane

Three days after Opus 4.6 dropped, Anthropic opened a waitlist for fast mode — a research preview that claims up to 2.5x faster output tokens per second. Same weights, same capabilities. They're not shipping a distilled model; they're running the real thing with faster inference.

The pricing reflects that. $30/$150 per million tokens, six times the standard Opus rate. Past 200K input tokens it jumps to $60/$225. That kind of premium only makes sense if you're burning through agentic loops where latency compounds at every tool call.

Which is exactly the use case. Claude Code already has a /fast toggle wired in. An agent calling itself forty times to refactor a module doesn't care much about per-token cost — it cares about wall-clock time. Shaving even a second off each round-trip adds up when you're watching a terminal.

One caveat buried in the docs: the speed gains apply to output tokens per second, not time to first token. The thinking pause stays the same. You just get the answer faster once it starts talking.

The beta gating — waitlist plus a dedicated header — suggests capacity is still tight. Scaling whatever inference trick powers this to Opus levels isn't a small engineering problem.

Sources:

Fast Mode Documentation - Anthropic
What's New in Claude 4.6 - Anthropic

Recent Entries