AI Unearths 300 WordPress Zero-Days for $20 Each: The Human Triage Crisis

A high-efficiency AI pipeline has discovered over 300 critical zero-day vulnerabilities in WordPress plugins at an estimated cost of $20 per bug, shifting the…

An AI-driven pipeline developed by researchers from TrendAI and CHT Security has identified over 300 critical zero-day vulnerabilities within the WordPress ecosystem during a 72-hour scan. The estimated cost for each discovery was roughly $20. The study, presented at Ekoparty Miami on May 22, 2026, highlights a shift in the cybersecurity landscape from technical feasibility to operational sustainability: manual verification of each finding takes between 30 and 60 minutes, rendering current disclosure models mathematically impossible when faced with hundreds of machine-generated reports.

The announcement comes as major vulnerability management frameworks, including the Zero Day Initiative (ZDI) and CVE registries, report critical backlogs. Steven Yu, a threat research engineer at TrendAI, warned that "motivated actors with a credit card" can now replicate these large-scale discovery campaigns with ease.

Key Takeaways

The AgentForge pipeline, developed in 72 hours, discovered over 300 critical zero-days in WordPress plugins using approximately 222 million tokens across 95 tasks.
The $20 per-vulnerability cost is largely a byproduct of the WordPress ecosystem's often lower code quality and is not necessarily applicable to hardened enterprise-grade codebases.
Automated dynamic verification successfully filtered out over 80% of false positives before they reached the disclosure queue, yet human triage remains the primary bottleneck.
The AI agent autonomously orchestrated a downgrade attack chain without human prompting or pre-taught patterns, successfully linking version rollbacks to exploitable flaws.

How the Pipeline Beat the Price of Coffee

The system, dubbed AgentForge, integrates PHP static code analysis, automated Docker provisioning, and dynamic verification via Chrome DevTools MCP. The orchestration consumed roughly 222 million tokens to generate over 300 validated vulnerability reports, averaging a cost of $20 per finding. Steven Yu clarified the findings: "This doesn't mean you can easily find a vulnerability in any WordPress site for just $20"—the figure is heavily influenced by the typical maintenance level of the plugin ecosystem.

The researchers targeted WordPress because its ecosystem contains over one million plugins, many of which are maintained by individual volunteers without security budgets. This lack of systematic code review lowers the economic barrier for automated discovery, though the same efficiency may not apply to enterprise software with hardened development lifecycles.

The core economic driver is the token-to-vulnerability ratio. The pipeline generates candidate vulnerabilities in bulk, followed by a dynamic verification phase—actual execution in a containerized environment—that discards more than 80% of false positives before a human ever sees the report. Only surviving findings enter the manual triage queue, where each item occupies a researcher for 30 to 60 minutes.

"We are already in a state where any motivated attacker with a credit card can execute this." — Steven Yu, TrendAI

Autonomous Attack Chains: AI-Led Downgrades

Reported findings included pre-authentication RCE, SQL injection, privilege escalation, and SSRF. The most significant demonstration of the agent's autonomy involved a downgrade attack: the AI located a vulnerability allowing a plugin to be rolled back to a previous version, recognized that the older version contained exploitable flaws, and chained them together without manual instructions or pre-taught patterns.

This specific chain was not programmed as a goal. The agent identified the rollback as a useful primitive, mapped the target version as vulnerable, and constructed the exploit path independently. This ability to recombine known primitives into unforeseen sequences shifts the threat from simple bug hunting to the generation of novel attack chains.

One plugin affected by a pre-auth RCE had over 1,000 GitHub stars, indicating a non-trivial user base. While researchers responsibly disclosed all findings prior to publication, the remediation time for these bugs has not been quantified—a lack of data that complicates the assessment of real-world risk to users.

Breaking the Arithmetic of Triage

The structural bottleneck is a matter of mathematics. Three hundred vulnerabilities, requiring 30–60 minutes of manual verification each, demand between 150 and 300 hours of work from highly qualified experts. A single 72-hour AI campaign generates more backlog than an entire team can process in weeks. Furthermore, these campaigns are replicable: they require no proprietary infrastructure or access to zero-day markets—only a credit card and workflow knowledge.

Yu highlighted the systemic consequences: "Organizations such as ZDI and NIST are currently struggling with massive backlogs due to the explosion of AI-assisted vulnerability reports." Responsible disclosure, a pillar of coordinated security since 2000, assumes that reporting capacity and triage capacity remain roughly in equilibrium. AI has permanently shattered this symmetry.

The $20 per vulnerability estimate does not include the cost of human triage. Adding an hour of specialized labor increases the price tag by orders of magnitude, making the current disclosure model economically unsustainable for white-hat actors. Conversely, black-hat actors who bypass disclosure operate at marginal costs effectively near the $20 mark.

Strategic Implications

Re-evaluate Third-Party Plugin Risk: The WordPress ecosystem is now a primary target for low-cost AI scanning. Security teams must identify which plugins are actively maintained and which are abandonware with exposed attack surfaces.
Isolate WordPress Systems: The probability of undisclosed zero-days in popular plugins has risen structurally. Network segmentation and the principle of least privilege are essential to reduce the blast radius of a plugin-based compromise.
Anticipate Disclosure Delays: The triage backlog means coordinated disclosure will take longer. Patches may arrive well after a vulnerability is known to malicious actors.
Transition to AI-Driven Triage: Yu suggested we must "fight AI magic with AI magic." Disclosure organizations and security vendors must invest in automated verification systems to trim the human queue, or the coordination model faces collapse.

Beyond Discovery: The Remediation Gap

The TrendAI/CHT Security research is not a technological record to be celebrated or feared in isolation. It marks a point of no return for a growing asymmetry: vulnerability discovery has become a batchable process with commodity costs, while verification, disclosure, and remediation remain serial, human-centric, and slow. While WordPress is the initial case study due to its vast surface area and heterogeneous code quality, the dynamic is applicable to any codebase with similar traits.

The entry barrier for automated discovery has fallen, and with it, the barrier for mass exploitation by actors who ignore disclosure. What remains is the final, overburdened human link in the chain. The question is no longer whether the next zero-day will be found, but whether anyone will have the time to verify and patch it before it is weaponized.

Yu concluded his presentation with a technological ultimatum: "Both white-hat and black-hat actors are already implementing these types of actions at scale." The race is no longer about who can discover the most bugs, but who can process and remediate them the fastest.

FAQ

Does the $20 cost include manual verification?: No. The $20 figure refers strictly to AI token consumption and scanning infrastructure. The 30–60 minutes of manual verification required per vulnerability represents a significant additional cost and is the primary bottleneck in the process.
Are the 300+ discovered zero-days already being exploited?: The researchers performed responsible disclosure before publishing their findings, but it is unclear how many have been patched or what the average remediation time is. The lack of confirmed exploitation in the wild does not rule out the possibility that malicious actors have independently discovered the same flaws.
Would this technique work on systems other than WordPress?: Steven Yu specifically noted that the success rate was tied to the variability of WordPress code quality. Enterprise-grade codebases with systematic reviews, hardened development cycles, and dedicated security budgets would likely not yield vulnerabilities at the same rate or cost.

Sources

Information has been verified against cited sources and is current at the time of publication.