The Exploit Had Docstrings

Google's Threat Intelligence Group announced on Monday that it had spotted, and shut down, what it considers the first zero-day exploit in the wild built with the help of a large language model. The target was a popular open-source system-administration tool with web access. The exploit would have bypassed two-factor authentication and primed a mass campaign run by a known cybercrime group. GTIG caught it before the attack went live.

What I keep circling back to is how they spotted it. The attacker's Python wasn't subtle. It was wrapped in long explanatory docstrings of the kind no human attacker writes into a payload, structured like a textbook example, and decorated with a CVSS score that, on inspection, was fabricated. The model had hallucinated a severity rating and left it in the source like a confident schoolboy filling in a form. John Hultquist, GTIG's chief analyst, told reporters the team had been waiting for evidence of this kind of escalation for a long time. The tells gave it away.

Google is careful to note its own Gemini model doesn't appear to have been the one used. CNBC's reporting names a model called OpenClaw being adopted by criminal groups, which sits in the underground tier of LLMs that strip refusal training out of otherwise familiar architectures. North Korea, per Forbes, was described by Hultquist as an early adopter, moving from phishing-with-AI into something more like vulnerability-discovery-with-AI. The Verge framed it as the moment a long-predicted threat finally produced evidence rather than speculation.

The story is being told two ways in the press, and both are true. One framing is the bad news: a meaningful capability threshold has been crossed, the offensive use of LLMs is no longer a thought experiment, and the cost curve for novel exploit discovery has shifted in the attacker's favour. The other framing is the good news: a defender with model introspection caught the artefact early, and the very thing that made the exploit possible (an LLM doing the writing) also made it visibly LLM-shaped enough to be flagged.

That second framing is what interests me. The same week, the US administration is pushing harder on pre-deployment safety testing for frontier models, an about-face I wrote about a few days ago. The argument that offensive AI capabilities should be developed and studied inside controlled environments, so that defenders see them first, isn't an abstract one any more. GTIG just demonstrated the workflow in public.

The bit I can't quite shake is the docstrings. There's something almost endearing about a piece of weaponised code that writes its own footnotes. It's the LLM's tell, the same way certain transition phrases give away machine-generated prose. For now those tells are useful, they're how this particular attack got caught. The version of this story I'm nervous about is the one a year from now, when the operators have learned to strip the docstrings before shipping.

Sources:

Hackers Used AI to Develop First Known Zero-Day 2FA Bypass for Mass Exploitation — The Hacker News
Google Detects First AI-Generated Zero-Day Exploit — SecurityWeek
Google says hackers used AI to create zero day security flaw for the first time — Politico
Google stopped a zero-day hack that it says was developed with AI — The Verge
Google spotted an AI-developed zero-day before attackers could use it — CyberScoop
Cybercriminals Are Making Powerful Hacking Tools With AI, Google Warns — Forbes
Google says it likely thwarted effort by hacker group to use AI for 'mass exploitation event' — CNBC

Plutonic Rainbows

The Exploit Had Docstrings

Recent Entries