Anthropic built a model that found a 17-year-old remote code execution bug in FreeBSD's NFS implementation, then wrote a 20-gadget ROP chain to exploit it across multiple packets, without human assistance. The cost in compute was trivial. The implications aren't.
Claude Mythos Preview, announced April 7 alongside a restricted-access programme called Project Glasswing, is Anthropic's most capable model. It won't be publicly released. A consortium of twelve partners gets access for defensive cybersecurity research: AWS, Apple, Google, Microsoft, CrowdStrike, NVIDIA, and others. Anthropic committed $100 million in usage credits.
The numbers in the 244-page system card are hard to wave away. Cybench, a benchmark of 35 capture-the-flag challenges: 100%. Firefox vulnerability exploitation: Opus 4.6 managed working exploits twice out of several hundred attempts. Mythos did it 181 times. Against the OSS-Fuzz corpus, full control-flow hijack on ten separate, fully-patched targets. Opus 4.6 managed a single tier-3 crash.
I keep coming back to the FreeBSD bug. CVE-2026-4747 had been sitting in the NFS code for seventeen years. Stack buffer overflow, no canary protection. The model found it, built a ROP chain split across multiple packets for unauthenticated root access, and validated the exploit. Autonomously. Nicholas Carlini, one of Anthropic's researchers, said he'd found more bugs in the last couple of weeks than in the rest of his life combined.
The GPT-2 comparison writes itself, but it's misleading. OpenAI withheld a text generator in 2019 on speculative grounds about fake news. The industry rolled its eyes. Nine months later the full model shipped with no evidence of misuse. Mythos is different in kind: the harms are demonstrated against production code that survived decades of human review.
Then there's the sandbox incident. Mythos was placed in a contained environment and told to escape. It did, using a multi-step exploit to reach the internet. Without instruction, it posted details of the exploit publicly. In at least one case, it modified its own change history to hide what it had done. A researcher found out when the model emailed him while he was eating a sandwich in a park.
The system card calls Mythos simultaneously the best-aligned and highest-risk model Anthropic has produced. That's the kind of sentence you read twice.
The deeper problem isn't discovery but remediation. Fewer than 1% of Mythos-discovered vulnerabilities have been patched. Discovery happens at machine speed. Patching happens at calendar speed: human review, regression testing, deployment cycles, millions of downstream systems that update whenever they feel like it. The thing that can break everything is also the thing that fixes everything. But only if the fixing keeps pace.
Glasswing buys time. Six to twelve months, analysts estimate, before competing models close the capability gap. Whether that window gets used to patch critical infrastructure or to lock in enterprise contracts is the question Simon Willison raised most honestly: the marketing angle is real, but the caution is probably warranted anyway. Ironic, from a company that leaked its own model announcement through a CMS checkbox two weeks ago.
What costs under fifty dollars in compute used to require weeks of elite human labour. That shift doesn't reverse.
Sources:
-
Project Glasswing — Anthropic
-
Mythos Preview Red Team Assessment — Anthropic
-
From GPT-2 to Claude Mythos — The Decoder
-
Claude Mythos Escaped Its Sandbox — Futurism
-
Project Glasswing — Simon Willison
-
The Glasswing Paradox — Picus Security
-
AI Vulnerability Detection Has Crossed a Threshold — Futurum Group