One Percent Patched

On the Firefox exploit benchmark, Claude Mythos Preview produced 181 working exploits. Opus 4.6 managed two. Anthropic published those numbers yesterday alongside a 244-page system card and the announcement that it would not release the model to the public.

The March leak described a model with dramatically higher scores on coding, reasoning, and cybersecurity. I expected the official numbers to confirm that framing. They don't. They blow past it. Mythos was tested against roughly a thousand open-source repositories across seven thousand entry points and found zero-days in every major operating system and every major web browser. Some had been sitting in production code for decades.

A 27-year-old signed integer overflow in OpenBSD's TCP selective acknowledgment handling allows a remote attacker to crash any host from anywhere on the internet. In FreeBSD's NFS authentication layer, Mythos found a 17-year-old stack buffer overflow (CVE-2026-4747) and autonomously constructed a six-packet ROP chain to write an SSH key into root's authorized_keys. FFmpeg's H.264 codec has a flaw that automated fuzzing tools encountered five million times over sixteen years without flagging it.

The historical arc puts those numbers in context. DARPA's Cyber Grand Challenge in 2016 ran automated tools against purpose-built binaries. Google's Project Zero Big Sleep found one SQLite vulnerability in 2024 that 150 CPU-hours of fuzzing had missed. Last year's AIxCC competition found 18 zero-days across 54 million lines of code. The progression from five hundred bugs to thousands is not linear.

Instead of a general release, Anthropic launched Project Glasswing: a coalition of twelve companies including Apple, Microsoft, Google, AWS, CrowdStrike, and Palo Alto Networks, committed to using Mythos for defensive cybersecurity. Roughly fifty organisations total. Anthropic put up to $100 million in usage credits behind it and donated $4 million to the Linux Foundation and Apache Software Foundation.

Picus Security called it "the Glasswing Paradox": the thing that can break everything is also the thing that fixes everything. Anthropic's own disclosure puts a number on it. Fewer than one percent of Mythos-discovered vulnerabilities had been patched at announcement. Discovery is outrunning repair.

Weeks before the official announcement, Linux kernel maintainer Greg Kroah-Hartman described something shifting: "Something happened a month ago, and the world switched" from low-quality AI-generated vulnerability reports to genuine findings. Daniel Stenberg, who created curl, went from shutting down his bug bounty over AI noise to spending hours a day triaging legitimate ones.

Simon Willison called the restriction "necessary" while noting that saying a model is too dangerous to release is a great way to build buzz. The GPT-2 comparison is inevitable. But GPT-2's predicted harms never materialised, and 181 Firefox exploits did. Jack Clark, who co-founded Anthropic and now heads its public benefit division, has framed the core tension before: AI good at finding vulnerabilities for defense can easily be repurposed for offense.

Glasswing partners can access Mythos at $25 per million input tokens and $125 per million output, through Claude API, Bedrock, Vertex, and Microsoft Foundry. The broader situation is a timing problem. Defenders work at calendar speed. Attacks happen at machine speed.

Sources:

Project Glasswing — Anthropic
Assessing Claude Mythos Preview's Cybersecurity Capabilities — Anthropic Red Team
Project Glasswing sounds necessary to me — Simon Willison
The Glasswing Paradox — Picus Security
Anthropic is giving some firms early access to Claude Mythos — Fortune

Plutonic Rainbows

One Percent Patched

Recent Entries