Anthropic shipped Claude Opus 4.7 this morning, about ten weeks after Opus 4.6 landed in February. Same pricing: $5 per million input tokens, $25 per million output. Same 1M context window in the extended variant. A handful of new knobs in Claude Code and the API. And one unusually candid line in the release materials, which I think is the most interesting thing about the launch.
First the numbers Anthropic actually cites. On CursorBench, 4.7 hits ~70%, up from 58% for 4.6. That is a twelve-point jump on a benchmark that tracks how the model behaves inside a working IDE, which is closer to the work than most evals. On Rakuten-SWE-Bench, Anthropic says 4.7 resolves three times as many production tasks as 4.6. SWE-bench Verified, SWE-bench Pro, and Terminal-Bench 2.0 numbers have been circulating on third-party blogs, but I cannot find them on Anthropic's own pages, so I am not going to quote them.
Where 4.7 actually feels different, to me, is in the developer
affordances. There is a new xhigh effort level above high,
which pushes the model into longer deliberation on hard tasks. A
"task budgets" public beta caps how much compute a single agentic
run can consume before it checks in. A /ultrareview command was
added to Claude Code. The model is better at using file-system
based memory across sessions. Vision inputs accept images up to
2,576 pixels on the long edge with higher fidelity than before.
Small things, individually. They compound.
Holding $5 input and $25 output across another generation is a concession to the shape of current demand. Nobody wants Opus priced out of daily use, and this is now a fairly stable frontier band.
And then there is Claude Mythos Preview, referenced in the Opus 4.7 launch materials as Anthropic's "most powerful model" — one that 4.7 is described as "less broadly capable than." Mythos itself was announced on April 7 under the name Project Glasswing, with a limited rollout to roughly fifty partner organisations and its own public system card. Opus 4.7 is today's general release. Mythos is the one most people cannot touch.
That is a strange thing for a frontier lab to put in an announcement post. The usual move is to ship your best and frame it as the best. Anthropic is instead shipping what it calls a production-ready step up from 4.6 while pointing openly at a more capable internal model nobody else can use at scale. The reason, per Anthropic's own framing, is not alignment. They describe Mythos as the best-aligned model they have trained. The concern is capability: Mythos is good enough at certain offensive-security tasks that Anthropic would rather gate it than ship it broadly.
That reframing matters, because my first instinct was to reach for the chain-of-thought honesty problem and assume Mythos was withheld because its reasoning could not yet be audited. That is not what Anthropic is saying. What they are saying is closer to: the model is aligned enough, but the capabilities it has are the kind that turn a careless user into a serious problem, so general access waits. That is a different kind of caution, and more interesting than "the new model is not safe enough yet."
For the work I actually do — which is how I end up judging any
model release — Opus 4.6 was already the best coding agent I had
used, and 4.7 in initial testing feels like a modest but real
step. The task-budget control is genuinely useful if you run long
agentic jobs that can spiral. xhigh is the knob for when you
want to burn tokens thinking about something hard. The rest is
refinement.
What I cannot do, yet, is compare any of it to Mythos. I suspect that comparison is the one Anthropic wants us to think about.
Sources:
-
Introducing Claude Opus 4.7 — Anthropic
-
Claude Opus — Anthropic
-
Claude Opus 4.7 is generally available — GitHub Changelog
-
Anthropic rolls out Claude Opus 4.7, an AI model that is less risky than Mythos — CNBC
-
Anthropic limits Claude Mythos rollout over cyberattack fears — CNBC