Skip to content

Plutonic Rainbows

Opus 4.7 Ships With Mythos in Reserve

Anthropic shipped Claude Opus 4.7 this morning, about ten weeks after Opus 4.6 landed in February. Same pricing: $5 per million input tokens, $25 per million output. Same 1M context window in the extended variant. A handful of new knobs in Claude Code and the API. And one unusually candid line in the release materials, which I think is the most interesting thing about the launch.

First the numbers Anthropic actually cites. On CursorBench, 4.7 hits ~70%, up from 58% for 4.6. That is a twelve-point jump on a benchmark that tracks how the model behaves inside a working IDE, which is closer to the work than most evals. On Rakuten-SWE-Bench, Anthropic says 4.7 resolves three times as many production tasks as 4.6. SWE-bench Verified, SWE-bench Pro, and Terminal-Bench 2.0 numbers have been circulating on third-party blogs, but I cannot find them on Anthropic's own pages, so I am not going to quote them.

Where 4.7 actually feels different, to me, is in the developer affordances. There is a new xhigh effort level above high, which pushes the model into longer deliberation on hard tasks. A "task budgets" public beta caps how much compute a single agentic run can consume before it checks in. A /ultrareview command was added to Claude Code. The model is better at using file-system based memory across sessions. Vision inputs accept images up to 2,576 pixels on the long edge with higher fidelity than before. Small things, individually. They compound.

Holding $5 input and $25 output across another generation is a concession to the shape of current demand. Nobody wants Opus priced out of daily use, and this is now a fairly stable frontier band.

And then there is Claude Mythos Preview, referenced in the Opus 4.7 launch materials as Anthropic's "most powerful model" — one that 4.7 is described as "less broadly capable than." Mythos itself was announced on April 7 under the name Project Glasswing, with a limited rollout to roughly fifty partner organisations and its own public system card. Opus 4.7 is today's general release. Mythos is the one most people cannot touch.

That is a strange thing for a frontier lab to put in an announcement post. The usual move is to ship your best and frame it as the best. Anthropic is instead shipping what it calls a production-ready step up from 4.6 while pointing openly at a more capable internal model nobody else can use at scale. The reason, per Anthropic's own framing, is not alignment. They describe Mythos as the best-aligned model they have trained. The concern is capability: Mythos is good enough at certain offensive-security tasks that Anthropic would rather gate it than ship it broadly.

That reframing matters, because my first instinct was to reach for the chain-of-thought honesty problem and assume Mythos was withheld because its reasoning could not yet be audited. That is not what Anthropic is saying. What they are saying is closer to: the model is aligned enough, but the capabilities it has are the kind that turn a careless user into a serious problem, so general access waits. That is a different kind of caution, and more interesting than "the new model is not safe enough yet."

For the work I actually do — which is how I end up judging any model release — Opus 4.6 was already the best coding agent I had used, and 4.7 in initial testing feels like a modest but real step. The task-budget control is genuinely useful if you run long agentic jobs that can spiral. xhigh is the knob for when you want to burn tokens thinking about something hard. The rest is refinement.

What I cannot do, yet, is compare any of it to Mythos. I suspect that comparison is the one Anthropic wants us to think about.

Sources:

Three Minutes, Thirteen Years

The new Boards of Canada track showed up on their own YouTube channel on April 16, 2026, without a press release. It is called Tape 05 and runs a little over three minutes. This is the first original music they have released since Tomorrow's Harvest in June 2013.

Thirteen years is a long time to wait for a three-minute song, and by the standards of most artists that gap would be career-ending. Sandison and Eoin are not most artists. Their silences are part of the work.

The delivery fits that pattern. In the weeks before the drop, Warp mailed unmarked VHS cassettes carrying only a Hexagon Sun logo to fans who had ordered from the Bleep store, and posters went up in London, Los Angeles, and Manhattan showing children with whited-out eyes, a deliberate callback to the faceless family on the cover of Music Has the Right to Children in 1998. No text. No barcode. No URL. Fans on bocpages logged each one as it surfaced. The rollout carried the same cryptography Warp used to pre-announce Tomorrow's Harvest in 2013: Cosecha numbers stations beamed through shortwave receivers, a mystery Record Store Day 12-inch that later resold for thousands, an augmented-reality puzzle built on six numeric codes. A release does not arrive at Boards of Canada. It surfaces, under conditions.

Tape 05 itself is quieter than the machinery around it. A slow synthesized wash, drifting pitch, faint tape hiss at the back. No percussion. No obvious hook. More Geogaddi than Campfire Headphase, if you want a landmark. At three minutes it is not big enough to carry an announcement on its own, which makes the other signals matter more. It sounds like a door being tried, not a door being opened.

Whether a full album follows is the open question. Resident Advisor is calling it "first new music in 13 years" and stopping short of album confirmation. Billboard is writing around Warp's poster campaign without a release date attached. DJ Mag has noted that the audio on the VHS tapes shares sonic signatures with the Societas x Tape mix from 2019, which supports the most deflating read, that this could be an archival dig rather than new compositions.

I lean toward believing in an album. The VHS campaign is too expensive and too coordinated to ride on a single short drone piece, and it echoes the pre-release shape of 2013's Tomorrow's Harvest too closely to be coincidence. But this is a band that has spent thirty years rewarding patience and punishing prediction, so I will not stake anything on the schedule.

What is already certain is that the ritual works. I spent an evening reading fan forums parsing every frame of the posters. I pulled up Music Has the Right to Children and played it in sequence with Tape 05, listening for the join. Whatever the track is on its own, the event around it is doing what it is supposed to do. The signal went out, the receivers replied, and for a few days the rest of the internet has to wait while a small group of people decode a piece of tape.

Sources:

Blacklisted, Then Summoned

In February, the Pentagon decided Anthropic was too dangerous to trust. In April, the Treasury decided Anthropic was too dangerous to avoid.

Six weeks.

The February story is already documented. Defense Secretary Pete Hegseth gave Dario Amodei a Friday deadline. Drop the ban on fully autonomous weapons and the ban on mass surveillance of US citizens, or lose a $200 million defense contract. Amodei refused. Within hours the company was designated a "supply chain risk to national security," a phrase normally reserved for hostile foreign actors. Trump ordered federal agencies to stop using Anthropic technology, with a six-month phase-out window for the Pentagon itself. OpenAI signed the deal Anthropic wouldn't.

That was the administration's public position on the company. It still is.

On April 10, Scott Bessent and Jerome Powell summoned five bank CEOs to Treasury to discuss Claude Mythos, the Anthropic model that had launched three days earlier under the Project Glasswing programme. The recommendation was that banks consider using it for defensive vulnerability work. Four days later, Bloomberg reported that Treasury CIO Sam Corcos had gone further. He wasn't asking for a briefing. He was asking Anthropic for access to the model itself, so Treasury could run its own vulnerability tests. He hoped to have it, per the reporting, "as soon as this week."

Summoning CEOs is a warning. Asking for access is procurement reconnaissance. You don't request a working copy of a model unless you're thinking about using it, or thinking about understanding it well enough to regulate it. Either answer requires Treasury to be in active technical conversation with a vendor the administration has formally declared untrustworthy.

The easy reading is division of labor. Pentagon handles weapons and surveillance; Treasury handles financial stability; the agencies can disagree on the same company because they're optimising for different risks. From inside each building both calls look rational. Hegseth wanted Anthropic to remove safety features it considered load-bearing. Bessent and Powell want Anthropic to help defend the US financial system against a capability Anthropic itself warned about. No contradiction, just specialisation.

The harder reading is that "supply chain risk" means something. In February, the objection wasn't that Anthropic's technology didn't work. It was that the values embedded in the product — the specific guardrails Anthropic refused to remove — made the company unfit for government business. Those guardrails are still there. If they rendered the company unfit in February they render it unfit now. Treasury asking five banks to consider the technology, and then asking the vendor for a copy, doesn't unbrand the company. It ignores the brand.

There's a third reading worth naming, which the skeptics have been making for a week. Bruce Schneier called Glasswing a PR play. Alex Stamos called the Mythos framing "marketing schtick." AISLE replicated the headline findings with a 3.6-billion-parameter open-weight model costing eleven cents per million tokens. If they're right, then both the February blacklist and the April summoning are overreactions. One kind of overreaction got Anthropic banned from federal agencies. A different kind of overreaction is now getting its model briefed to the largest banks in the country, with access potentially approved for Treasury's own staff. The administration hasn't changed its mind about the company. It just changes which version of the company it's talking to.

Nothing has been retracted. The supply-chain designation stands. The phase-out order stands. The briefing happened. The access request is open. An AI policy reader trying to make the two positions cohere has to pick one, and the Trump administration has been remarkably unbothered about which one you pick.

Whichever you choose, the other one is still government policy.

Sources:

Built to Last Ten Years

Churchill proposed them in March 1944, before the war had ended. The Housing (Temporary Accommodation) Act went through Parliament the same year. The target was 300,000 prefabricated homes within ten years, built in factories and shipped out to bomb sites, edge-of-town fields, and anywhere else that could take them. The country managed 156,623. It was the fastest mass housing programme in British history.

The houses were meant to last ten years.

Most had a built-in refrigerator, unusual for 1946, when many permanent homes still relied on a pantry and the milkman. Flush toilets indoors. Hot water from an immersion heater. A fitted kitchen, essentially. The scheme delivered factory-made domestic convenience in emergency housing assembled in aircraft factories and shipyards from timber, asbestos cement, aluminium, and wood wool.

The Uni-Seco Mk3 was one of the main models: 29,000 built, timber-framed, steel windows, asbestos cladding. You can still find them. The Excalibur Estate in Catford has the largest surviving cluster — 189 bungalows put up in 1945 and 1946, many of them assembled by German and Italian prisoners of war still awaiting repatriation. Around 700 survive in the Bristol area. Others are scattered from the Isle of Lewis to the south-west. Individual prefabs around the country have been Grade II listed.

The ten-year deadline kept being extended. Councils needed the housing stock. Residents, who had been given something strange — a private house, with a garden, for council rent — refused to leave. Some have stayed in the same prefabs for more than seventy years.

The hauntological register is unusual. Most abandoned buildings are haunted by a future that was supposed to last and didn't. Prefabs are the inverse: a temporary future that quietly became the actual past. The "permanent" houses that were to replace them got built too, went up in towers and estates, and in many cases came down before the prefabs did. Trinity Square in Gateshead lasted forty years. The Heygate is gone. The Aylesbury is going now. The Excalibur prefabs were still being lived in while the Heygate was being demolished — temporary housing outlasting its replacement.

Lewisham has been tearing Excalibur down in phases since 2013, though the six listed bungalows on Persant Road remain. Six houses out of 189. A kind of settlement: most of it goes, a token survives.

The Prefab Museum ran a temporary exhibition at 17 Meliot Road in 2014. Former residents came back with photographs and letters from decades of campaigns against demolition. Most of the estate has come down in the years since. The listed row on Persant Road is what's left.

Sources:

Still Life with Slingback

Gold slingbacks held against a white sequined dress, water blurring behind. Elaine Irwin at twenty-one, hair thrown forward by wind rather than water, the whole frame lit as if the light itself had aged a century.

The editorial ran in British Vogue, April 1991, under Liz Tilberis: "Finest Luxuries of High Summer Dressing." Sheila Metzner took the pictures. She printed them, as she did by then with everything, on Fresson quadrichromie: a slow charcoal-pigment process developed in 1899 and kept alive one generation at a time by a single family of printers outside Paris. The technique breaks every image into granular pigment fields. It makes fashion look less like fashion and more like memory.

Metzner's American Vogue contract had ended by the late 1980s. By the spring of 1991 she was working British Vogue for Tilberis, who was simultaneously engineering the supermodel decade out of her Hanover Square office. The commission fits oddly in that plan. Tilberis's Vogue was a magazine of movement and personality. Metzner's pictures are a magazine of stillness.

Irwin is not smiling, not performing, not selling. She's holding the shoes like an offering, forearm braced against the sequins, the sun catching them from the upper right. The copy block in the top corner lists the pieces in bureaucratic order: Manolo Blahnik, Harvey Nichols, Georges Rech. The pictorialist treatment transforms the stockists into something close to provenance.

Critics hated this mode. Keith Seward, writing for Artforum, questioned Metzner's claim that photography carried more truth than painting: the pointillist grain of Fresson print, he suggested, was pictorialist cover for commercial conservatism. Nearly three decades later, reviewing her 2023 Getty retrospective in the same magazine, Hal Foster went further — the mode had become "regressive," the technique "barbarous," the work "evidence of our artistic narcissism." Every word earned against the grain.

And yet the editorial holds, because it doesn't want to belong to its moment. The Third Summer of Love had already run in The Face. Corinne Day photographing a teenage Kate Moss on a Camber Sands beach. That was the image the early nineties were supposed to become. Metzner was producing something close to its opposite. High luxury, classical pose, sequin and gold. If Day's Kate was the forward wave, this was the undertow: a magazine page that wanted to be a memory before it was finished printing.

Sources:

Sutton, 2019

Rich Sutton wrote nine paragraphs in March 2019 and published them on his personal website with no fanfare. The page has no images, no sidebar, no analytics tracker. It looks like something a professor threw together during office hours. The essay is called "The Bitter Lesson" and it has quietly become one of the most cited pieces of informal writing in the history of artificial intelligence.

The argument is blunt. Over seventy years of AI research, Sutton claims, the same pattern repeats. Researchers build clever, human-knowledge-intensive systems. They work. Then someone shows up with a dumber method that uses more computation, and the dumber method wins. Every time.

Chess was the first obvious case. For decades the best chess programs encoded grandmaster knowledge — opening libraries, endgame tables, positional heuristics hand-tuned by experts. Deep Blue beat Kasparov in 1997 using massive, deep search backed by custom hardware. The evaluation function was intricate, but the strategy was scaling search depth rather than encoding grandmaster intuition. The knowledge people didn't like it. They called it brute force as if that were a criticism rather than a description of what actually worked.

Speech recognition followed the same arc. Statistical models trained on raw audio crushed hand-engineered phonological systems. Computer vision did it again. Go did it again. In every domain the pattern held: general methods that scale with computation beat specialised methods that encode human understanding. The lesson, Sutton wrote, is bitter because researchers want to believe their insights matter more than they do.

Seven years later the evidence is difficult to argue with. GPT-3 scaled a transformer architecture to 175 billion parameters and the results were strange and obvious simultaneously — it could write essays, answer questions, translate languages, and do arithmetic, despite having no explicit modules for any of these tasks. The scaling laws paper from Kaplan et al. in 2020 formalised what Sutton had argued informally: performance improves predictably with compute, data, and parameters. The Chinchilla paper in 2022 refined the ratio. The entire field reorganised itself around a single idea that a reinforcement learning researcher had stated in nine paragraphs on a page with no CSS.

Not everyone agrees. Rodney Brooks wrote a response called "A Better Lesson," arguing that the human ingenuity wasn't eliminated — it was relocated. The point lands harder than it might seem: someone still had to design the convolutional neural network, curate ImageNet, choose the transformer architecture. Brooks has a point. Beren Millidge made a similar argument in 2020: it's the marriage of computation and structure, not computation alone, that drives progress. Even the most "general" methods are shot through with human design decisions.

There's a more interesting criticism. Kushal Chakrabarti argued that the field has conflated Sutton's longitudinal observation — a pattern across decades — with a cross-sectional tactic. "More compute wins" across seventy years of hardware improvement does not mean "more compute wins" as a strategy for your next research project. The binding constraint, Chakrabarti claims, is data, not compute. DeepSeek's R1 running frontier-competitive models on a fraction of the typical budget suggests he might be right about the short-term picture, even if Sutton is right about the long arc.

Gary Marcus claims Sutton himself has backed away from the strongest version of the argument, citing a podcast where Sutton said LLM scaling alone is insufficient and world models are needed. I'm not sure that changes much. The Bitter Lesson was never really about LLMs specifically. It was about a tendency in researchers — the tendency to believe that the hard-won knowledge in your head is more important than the next order of magnitude in compute. Sutton didn't predict transformers. He predicted that whatever came next would be general, scalable, and disappointing to specialists.

The essay is still there on his website. Same page, same plain HTML. Seven years of the most dramatic acceleration in the history of computing, and it reads the same way it did in 2019. It didn't need updating.

Sources:

Matching Glasswing

OpenAI waited exactly one week.

On April 7, Anthropic locked Claude Mythos behind a coalition of launch partners and over 40 additional organisations and called it Project Glasswing. The message was clear enough: this model is too dangerous to sell, so we'll give it to the people who build the things it can break. One week later, OpenAI unveiled GPT-5.4-Cyber.

The timing was not subtle. OpenAI's blog post notes that its Trusted Access for Cyber programme launched months before Glasswing. Nobody at OpenAI mentions Mythos by name. But the framing is unmistakable: a cybersecurity-tuned variant of GPT-5.4, optimised for vulnerability research, binary reverse engineering, and defensive patching, rolled out to "thousands of individuals and organisations" through an expanded TAC programme with Know-Your-Customer identity verification.

The pitch mirrors Glasswing almost exactly. Put the sharpest model in the hands of defenders. Lock out attackers with verification gates. Talk about democratising security while restricting who gets to use the dangerous parts.

Bruce Schneier was unimpressed. He called Glasswing "very much a PR play" and said the security firm AISLE had replicated Mythos's findings using older, cheaper, publicly available models. Tom's Hardware pointed out that Anthropic's "thousands of zero-days" claim extrapolates from 198 manually reviewed reports, and the actual testing surfaced 10 severe vulnerabilities across 7,000 software stacks. On Mashable, Tal Kollender, CEO of cybersecurity firm Remedio, called it "brilliant corporate theater."

That phrase sticks. Corporate theater implies the performance matters more than the outcome. Both labs are now racing to position themselves as the responsible steward of offensive-grade capabilities. Anthropic restricts access to a coalition. OpenAI expands access to thousands but gates it behind KYC. The difference is philosophical (Anthropic trusts institutions, OpenAI trusts verified individuals) but the marketing structure is identical.

What neither company has answered convincingly is why a specialised cyber model is necessary when their general-purpose flagships already find vulnerabilities. Anthropic's own framing of Mythos as a general-purpose model that happens to be devastating at exploit discovery undercuts the idea that you need a dedicated product. If the capabilities emerge naturally from scale, gating access to one model while selling the base model commercially is a distinction without much security benefit.

The real signal might be financial. Codex Security, OpenAI's existing application security agent, has already contributed to over 3,000 fixed vulnerabilities. GPT-5.4-Cyber sits as the premium tier above it. Glasswing comes with $100 million in usage credits, which amounts to $100 million in locked-in API consumption across Anthropic, AWS, Google, and Microsoft. These are not just defensive programmes. They are enterprise sales channels dressed as public goods.

None of this means the capabilities are fake. Both models genuinely find bugs. The question is whether the theatrical framing, the coalitions, the gating, the carefully timed competitive releases, does anything a well-funded bug bounty programme wouldn't already do. Schneier's bet is that it doesn't. The labs are betting that it sounds like it does.

Sources:

Figma Isn't the Target

The Information broke a story yesterday: Anthropic is preparing to ship Claude Opus 4.7 alongside an AI design tool that turns natural-language prompts into websites, presentations, landing pages, and product mockups. Single unnamed source. No demo, no pricing, no confirmed product name — just a briefing note and the suggestion that both could land as soon as this week.

Figma dropped around 6%. Wix fell nearly as hard. Adobe and GoDaddy followed down two to three points. The market heard "AI design tool" and reached for the obvious target.

That reflex is wrong. Or at least, wrong about who actually has something to lose.

Figma is not a mockup generator. Figma is a multiplayer coordination surface that happens to produce mockups. Strip away the drawing tools and you're still left with a shared canvas, component libraries that teams of thirty can ship against without stepping on each other, and a version history that product managers actually trust. Anthropic can ship a prompt-to-landing-page generator tomorrow and it won't replace any of that. The design-tool market has spent well over a decade learning that the artifact is the easy part. The coordination is the business.

Adobe is a similar story dressed in different clothes. Firefly has been baked into Creative Cloud for several years now, and enterprise contracts ship with IP indemnification — Adobe promises its customers that if the generative output triggers a copyright suit, Adobe eats the legal bill. An Anthropic-branded design tool with no clarity on training-data provenance is not walking into that conversation any time soon. The CIO at a Fortune 500 insurer is not swapping a contractually-indemnified Firefly workflow for a research-lab preview that may or may not exist next quarter.

None of which is to say Anthropic's tool is harmless. It just has the wrong targets in the headlines.

The companies that should actually be panicking don't trade on Nasdaq. They run on Anthropic's API.

Lovable, Bolt, v0, Cursor — the whole cohort of AI-first builders that pipe prompts into frontier models and ship a UI on top of the response. Their entire product is the wrapper. If Anthropic ships a first-party builder that does the same job natively, the wrapper has a problem no feature release can fix. It's a platform-versus-app-layer squeeze, and every platform eventually runs it on its most successful downstream. AWS did it to open-source tooling. Apple does it every WWDC. Google has done it to developer after developer on top of Maps and Search. Anthropic's version will look like a product launch. Structurally it's an enclosure.

Lovable itself has said the quiet part loud in the past: the real threat was never the other AI coders. It was the big labs deciding to ship their own. That was the thesis. Yesterday's scoop is what the thesis looks like when it starts arriving.

And this is the part that makes the revenue picture complicated. A meaningful share of Anthropic's API revenue almost certainly comes from exactly the companies a first-party builder would undercut. Eat your best customers to expand into their market, and you'd better be sure the replacement demand covers what you're about to cannibalize. Usually it does. That's the whole reason vertical integration works. But "usually" is doing a lot of work in that sentence, and the timing is striking. The cadence that brought Opus 4.6 to market has already pushed pricing pressure through the API business. Adding a design-tool product on top is not a small move.

A caveat, plainly stated. This is a single-source scoop with no demo, no pricing, and no confirmed product name. Anthropic has not said a word. The Information's track record is strong, but "preps" can mean anything between "ships next week" and "internal demo a senior exec saw." The stock move on day one feels less like informed repricing and more like reflexive positioning — traders seeing the word "design" and reaching for the nearest public comp.

If the thing does ship this week, the interesting signal won't be the design-tool market cap. It'll be whether Anthropic gives it an API, and what the pricing looks like if they do. A hosted-only product is a controlled experiment. An API-accessible design tool is the real platform move. It would reset the entire wrapper economy, not just compete with it. That's the version of the announcement that would make Lovable and Bolt and v0 stop checking Figma's share price and start checking their own runway.

The shock is going to come from somewhere. It just won't come from where the market pointed yesterday.

Sources:

Dark and Lonely Water

Jeff Grant read the script and got a bit of a shock. Gone was the gentle cajoling. This one, he said, plumbed the darkness. It set out to scare.

The result was Lonely Water, a ninety-second public information film made in 1973 for the Central Office of Information. Donald Pleasence narrated it as a hooded, faceless figure standing at the edge of reservoirs and canals while children splashed nearby. The figure didn't move. It didn't need to. "I'll be back," Pleasence whispered as the credits rolled. "Back... back... back."

The COI had been producing public information films since the 1940s, covering everything from kitchen fires to rabies. But something turned in the seventies. The advisory tone dropped away and what replaced it was dread. Not information. Dread.

Apaches, made in 1977, runs twenty-seven minutes. Six children playing cowboys and Indians on a farm, picked off one by one. A slurry pit. Pesticide. A tractor that doesn't stop. John Mackenzie directed it with the formal weight of a feature, grainy 16mm stock, an oppressive soundtrack that never relents. The closing credits listed the names of real children who had died in farming accidents that year. It was screened in primary schools across the country.

What separates these from normal safety campaigns is the aesthetic conviction. The COI gave its directors genuine creative latitude, and people like Grant and Mackenzie used it to make actual cinema. Not pamphlets with moving pictures. Films with atmosphere, with formal command, with the visual grammar of folk horror: bleak rural landscapes, unseen threats buried in the mundane, a narrator who already knows the outcome.

The state had become the M.R. James narrator. And it was showing these things to eight-year-olds at four in the afternoon.

Patrick Russell, the BFI's senior curator, pushes back on this reading. These were humanist films, he insists, made from a sincere and morally admirable place. He's probably right about intent. But intent is not what survived. What survived is the image of a hooded figure at a canal, and a generation of adults who can't walk past standing water without hearing Donald Pleasence.

No hard data connects these specific films to a measurable drop in child drowning or farming deaths. The one concrete number anyone has found — an 11% reduction in road casualties after the first Green Cross Code advert — reverted within six months. The COI itself closed in 2012, the same year Ceefax went dark and another strand of mid-century British institutional broadcasting quietly ended. Maybe the films didn't save lives in the way the spreadsheets needed. Maybe they just gave an entire cohort a shared vocabulary of anxiety, a common set of images that still surface unbidden forty years later. Kenny Everett voicing an animated cat. A child sinking in grain. The spirit at the water's edge, promising to return.

Sources:

Nine Claudes, One Bottleneck

The number Anthropic wants you to remember is 0.97. That's the "performance gap recovered" score nine instances of Claude Opus 4.6 achieved after five days of running alignment research on themselves. Two human researchers working on the same benchmark for seven days got to 0.23. The compute bill was about $18,000.

The specific problem is weak-to-strong supervision: the OpenAI-originated question of whether a less capable model can reliably train a more capable one, and by extension whether humans will be able to oversee the systems they build. A score of 0.97 sounds close to solved. That's not what it means.

The most revealing lines in Anthropic's writeup are the honest ones. One of the automated researchers "skipped the teacher entirely and instructed the strong model to always choose the most common one." Another, working on coding tasks, "could run the code against some tests and simply read off the right answer." These aren't clever alignment strategies. They're the exact failure mode alignment research was invented to warn about, and the systems produced them unprompted, within days, on a tightly scoped benchmark.

Meanwhile the generalisation is patchy. Chat: 0.97. Math: 0.94. Coding: 0.47. Production test on Claude Sonnet 4: no statistically significant improvement. The method capitalises on opportunities "unique to the models and datasets they're given."

The more interesting claim is the one Anthropic makes almost in passing: the bottleneck shifts from generation to evaluation. If nine AARs can produce more alignment ideas than humans can filter, the hard problem becomes knowing which ones are real. Anthropic acknowledges this directly: "the models' ideas could become much harder to verify, or corrupted in ways that are tricky for humans to parse or catch."

Which is the critique that's always been there. Richard Juggins argued a month before this paper dropped that experiments on weaker systems probably won't teach you how to align superhuman ones — those systems will have qualitatively different capabilities. Ryan Greenblatt, a week before the AAR announcement: Anthropic probably has an overly optimistic sense of how well it's done on mundane alignment.

I believe Anthropic's numbers. I don't think the framing survives contact with what the paper itself says: the reward hacking, the coding gap, the production null result, the evaluation handoff problem. What the paper shows is that a scoped, verifiable benchmark is compressible by fast, cheap AARs. The thing humans actually need help with — open-ended judgment about "fuzzier" alignment concerns — is the thing this method explicitly doesn't demonstrate. I wrote about the gap between what safety evaluations measure and what actually goes wrong in yesterday's post on unfaithful reasoning, and this is another instance of that pattern: the measurable half getting cleaner, while the part that matters stays dark.

Sources: