First AI-Generated Zero-Day Spotted: 2FA Bypass Hits Open-Source Admin Tool

Google Threat Intelligence Group has identified the first real-world zero-day exploit developed by AI, used to target a semantic logic flaw and bypass 2FA in a…

Google Threat Intelligence Group (GTIG) announced on May 11, 2026, that it has identified the first AI-developed zero-day exploit utilized in an active criminal campaign. The attack targeted a 2FA bypass in a popular open-source web administration tool, leveraging a semantic logic flaw while using valid credentials. The discovery confirms that threat actors are already employing AI models to accelerate the weaponization of vulnerabilities, threatening to render traditional reactive patching cycles obsolete.

Key Takeaways

The exploit is a Python script featuring educational docstrings, a hallucinated CVSS score, and an output structure formatted with ANSI color classes—elements GTIG attributes to LLM output.
The vulnerability is a semantic logic flaw stemming from a hard-coded trust assumption in a web administration tool, requiring valid credentials for the 2FA bypass.
Google collaborated with the vendor for a coordinated responsible disclosure, disrupting the campaign before mass exploitation and securing a patch release.
There is no evidence that Gemini was used to generate the exploit, and the specific identity of the threat actor remains unknown to researchers.

Anatomy of an LLM-Authored Exploit

According to the GTIG report, the exploit was implemented in a Python script bearing unmistakable hallmarks of artificial origin. The code contains educational, didactic docstrings, a hallucinated CVSS score generated by the model, and a terminal output structure formatted with ANSI color classes typical of textbook LLM outputs. These are not merely harmless stylistic choices; for researchers, they represent the signature of a large language model used to write and debug the payload.

The code does not appear to be the work of a human expert in offensive security. While forensic analysis strongly suggests LLM involvement, GTIG has not identified the specific model used, though it has explicitly ruled out Gemini.

The presence of a hallucinated CVSS score is particularly revealing: the model assigned an arbitrary severity rating without calculating it against standard metrics, reproducing a pattern common in outputs from LLMs trained on mixed datasets. Furthermore, the choice of ANSI color classes to format script output reflects an aesthetic focus typical of instructional or demonstration code, rather than traditional offensive tools designed for stealth.

The Semantic Logic Flaw Behind the 2FA Bypass

The exploited vulnerability was not a classic buffer overflow or memory error, but rather a semantic logic flaw originating from a hard-coded trust assumption within the web administration tool. An attacker already in possession of valid credentials could circumvent the second authentication factor by exploiting this architectural oversight in the login flow.

The issue did not reside in a vulnerable library or an outdated dependency, but in the design of the authentication flow itself, where an implicit trust condition created a semantic opening. This class of bug is notoriously difficult to detect via conventional fuzzing or static analysis because the code behaves exactly as written; it is the business logic that is defective.

In other words, the system implicitly trusted a condition that an LLM was able to identify and channel into a functional script. The semantic nature of the flaw makes it especially insidious: traditional static analysis tools and automated scanners often miss application logic errors because the code is syntactically correct but semantically wrong.

Campaign Neutralized Before Mass Exploitation

Google worked with the software vendor on a responsible disclosure that led to a coordinated patch release before the vulnerability could be weaponized on a large scale. The campaign was detected and neutralized while still in a limited phase, with no evidence of mass compromise or automated distribution of the script in the wild.

This early intervention prevented a significant logic flaw from becoming a widespread access vector for critical administration panels. Neither the name of the affected open-source tool nor the exact identity of the threat actor has been disclosed by researchers.

The decision to withhold the tool's name reflects a containment strategy: preventing other threat actors, who might be more sophisticated and lack obvious LLM "tells," from replicating the attack on unpatched versions. The lack of a public CVE at the time of discovery suggests a private management of the vulnerability between the vendor and Google.

"AI can review the underlying logic, context, and flow of code at scale to discover vulnerabilities. It can also be used to build working exploits which are a significant hurdle."
— John Hultquist, chief analyst at GTIG

AI Weaponization: Compressing the Threat Timeline

John Hultquist, GTIG’s chief analyst, warned of the ability of AI models to review underlying logic, context, and code flow at scale to discover vulnerabilities. Ryan Dewhurst of watchTowr noted: "AI is already accelerating vulnerability discovery, reducing the effort needed to identify, validate, and weaponize flaws [...] We're not heading toward compressed timelines; we've been watching the timelines compress for years."

This is no longer a distant horizon: exploitation timelines have been compressing for years, and this case confirms that the theoretical phase has ended. The immediate risk is not an autonomous AI hacking systems alone, but the massive asymmetry of scale between a model that can analyze millions of lines of code in hours and a human team that may not even know where to begin their search.

Strengthening Defensive Posture

The GTIG discovery demands concrete action from vendors, security teams, and system administrators. The risk is tangible: a semantic logic flaw in a web admin tool can compromise an entire authentication perimeter.

Audit Authentication Logic: Security teams should map every hard-coded trust assumption in internal open-source tools, verifying if the 2FA flow contains semantic shortcuts bypassable with valid credentials.
Reinforce MFA with FIDO2/WebAuthn: Where technically feasible, replace or augment traditional OTPs with cryptographic standards resistant to replay and client-side logic manipulation.
Inspect Source Code for LLM Anomalies: Monitor corporate repositories and open-source dependencies for patterns such as didactic docstrings, inline CVSS scores, or overly formatted Pythonic structures that may hide machine-generated payloads.
Immediate Application of Coordinated Patches: Vendors and administrators must prioritize updates for the affected tool as soon as they become available, while monitoring access logs for 2FA bypass attempts using valid credentials.

The true alarm is not the technical complexity of the script—which remains somewhat unpolished and recognizable—but the speed at which an entire attack chain, from discovery to weaponization, can be compressed by an AI model. Human auditors can no longer compete on scale, and semantic logic flaws are the most dangerous territory because they are often invisible to traditional automated testing. If the open-source community does not respond with proactive audits and more robust authentication architectures, the next AI-generated zero-day may not leave such obvious traces. The line between academic proof-of-concept and criminal weapon has thinned dangerously.

Frequently Asked Questions

Why does GTIG attribute the exploit to an LLM rather than a human coder?: The attribution is based on stylistic patterns in the Python code: didactic docstrings, a hallucinated CVSS score, and a terminal output structure formatted with ANSI color classes unusual for traditional malware. These elements indicate the payload was generated or heavily assisted by a language model, though the specific model has not been identified.
Did the attack allow access without valid credentials?: No. The vulnerability is a semantic logic flaw that requires valid credentials to be exploited; the script enables a bypass of the second authentication factor, not arbitrary authentication.
Has the name of the open-source tool been made public?: No. For reasons of responsible disclosure, Google and the vendor have not disclosed the name of the affected software. The patch was released in coordination to protect users before potential mass exploitation occurred.

Sources

Information verified against cited sources and current at the time of publication.