The sticker price of a frontier model has been falling for eighteen months. The bill I run up using one has not. Both are true, and they are both true for a reason.
Per million input tokens, GPT-5.4 Standard is $2.50 and $10 on the output side. Claude 4.6 Sonnet sits at $3 and $15. Gemini 3.1 Pro is $1.25 on input, with the output price landing somewhere between $5 and $12 depending on the table you trust. Eighteen months ago the comparable tier was meaningfully higher. At the budget end the collapse has been more dramatic. GPT-5 Nano is $0.05 in, $0.40 out. Gemini 3.1 Flash-Lite hovers around $0.10 and $0.40. DeepSeek halved its prices in late 2025 and the copycat pressure has not let up. If all you are doing is matching 2024 workloads to 2026 models, you are paying less.
That is the sticker answer. The receipt answer is worse.
The first thing that moved is the premium tier. GPT-5.4 has a High Reasoning mode that runs $10 in, $40 out — roughly 16× the standard tier for the same provider, same parent model, just with the thinking dial turned up. A Claude Opus Fast Mode clears $30 per million input tokens. Long-context windows, the big selling feature of 2025, became a billing surface: GPT-5.4 doubles input pricing beyond 272K tokens and adds 1.5× on output, and Gemini 3.1 Pro doubles input beyond 200K. Anthropic, to its credit, removed its long-context premium on March 13; the full 1M window bills at standard rates now.
The second thing is the token count itself. Gemini 3.1 Pro's chain-of-thought reasoning generates internal tokens billed at output rates, and a simple prompt can consume three to five times more tokens than expected. This is the quiet version of a price hike. You did not pay more per token. You paid for more tokens. Any workflow that shipped in 2024 with a predictable output length is now spending meaningfully more on the same question if the model is thinking before answering. Which it will be, because every provider is pushing you toward the reasoning variants as the default for serious work.
Third is where most of the enterprise analysis actually lands: context caching. Both platforms discount cached reads heavily, up to 90% off base rates on repeated context. If your workload has a stable system prompt and repeated document context — customer support, code assistance, document processing — the effective per-million blended rate can compress substantially. If your workload does not — agentic tools that spin up fresh contexts, one-shot research queries — the cache discount saves you nothing. The bill diverges based on access pattern, not sticker price.
Against all of that, the budget tier deserves its own column. A Gemini 3.1 fast variant at $0.075 in, $0.30 out is not a frontier model. It is a utility-grade language model that happens to be smarter than GPT-4 was two years ago. That tier has collapsed so far below the frontier that calling any of this a "price rise" misses the entire rearrangement. For high-volume, low-stakes work, the cost per task has fallen by an order of magnitude. For frontier-quality work on harder tasks, the cost per task has held steady or climbed.
I suspect this is what the providers want. The price floor drops aggressively enough that casual users move on sticker alone. The premium ceiling goes up just as aggressively, and the people who have actual hard problems — research, production coding agents, long-horizon reasoning work — end up on the tier where the margin actually lives. Between the two, the flagship standard tier holds a middling price that lets the pricing page look competitive without giving anything away.
The honest answer, then: per token, no. Per unit of intelligence applied to a specific task, probably yes for anything hard, and flatly yes if you are using the reasoning modes the providers are quietly making default. The budget I set in 2024 for a particular agentic task has to stretch much further on 2026 Opus at $5 and $25, because the agent thinks three times as hard before producing the same output and the output itself is longer. That is not a price rise. It is also not a price cut. It is the industry offering a cheaper ruler and then giving you longer things to measure.
A ruler I keep thinking about: the automated alignment researchers Anthropic ran cost $18,000 in compute for nine Claude instances running for five days. A year ago that per-token figure would have been higher. I doubt the total would have been lower.
Sources:
-
AI Model Pricing Guide 2026: GPT-5.4, Claude 4.6 & Gemini 3.1 Cost Breakdown — Talkory
-
Google Gemini API Pricing 2026 — MetaCTO
-
AI Coding Model API Pricing 2026 — Serenities AI
-
Top Cost-Efficient Small Models for AI APIs — Clarifai
-
LLM Cost Per Token Comparison — ClawPane