BadBone: Dormant AI Backdoor Evades Six Major Security Defenses

BadBone research demonstrates that backdoors in pre-trained AI models remain invisible until customized, maintaining a 0.10% attack success rate in their dorma…

BadBone: Dormant AI Backdoor Evades Six Major Security Defenses A research team has unveiled BadBone, an attack that embeds dormant backdoors into pre-trained AI backbone models. The poisoned model bypasses standard security checks with an attack success rate of 0.10%—identical to a clean model—but triggers malicious behavior at a ~99% success rate after customization via prompt learning. The research, reported on June 2, 2026, by Help Net Security, proves that six existing defenses systematically fail to detect the threat at the point of download and verification.

Key Takeaways

The BadBone backdoor requires two simultaneous conditions to activate: model customization via prompt learning and the presence of a specific input trigger.
Without customization, the attack success rate is 0.10%, making the model indistinguishable from a clean one during pre-acquisition audits.
Six existing defenses (Neural Cleanse, ABS, MNTD, NAD, CLP, D-BR) fail to detect the backdoor in most tested configurations.
Attackers do not require the victim's data; a surrogate dataset with similar content is sufficient to prepare the attack.

The Co-activation Mechanism: Two Conditions, One Window of Exposure

The research paper defines this mechanism as "prompt-and-trigger co-activation." The backdoor remains dormant in the backbone model until two conditions are met: customization for a downstream task via prompt learning and the insertion of a trigger into the input. This temporal separation is the core of the attack. Existing defenses test the model in its downloaded state, prior to customization. In that state, the backdoor is inert. Experimental tests document this dual behavior. On triggered images, the poisoned model without customization classifies exactly like a clean model, with a measured success rate of 0.10%. Once customization is complete and the trigger is present, the rate jumps to ~99%. This transition is not gradual; it is a binary switch that activates only at the moment of production deployment, not during verification. The name of the attack highlights the target. As the source reports: "The name points at the target. Corrupt the skeleton, and systems built on top of it carry the flaw." The backbone model serves as the skeleton; every system built upon it inherits the invisible corruption until the moment of co-activation.

Six Defenses Tested, Six Structural Failures

Researchers tested six existing defenses against BadBone on ResNet and BiT-M-RN50 models. The results are systematically negative for standard detection methods. Neural Cleanse and ABS classified all six poisoned models as clean, resulting in zero detections across six attempts. MNTD detected larger BiT-M-RN50 models with high probability but missed the majority of ResNet models. NAD failed to produce effective results in the tested configurations. CLP suppresses the backdoor, but only at the cost of degrading model utility, making the defense unacceptable for production environments. D-BR leaves the backdoor in place, failing its stated objective. The pattern is consistent: current defenses assume a backdoor is always triggerable regardless of the model's state. BadBone violates this premise. Because the backdoor is only activatable after customization, pre-customization tests provide no information regarding the presence of the threat.

Supply Chain Risk: Models as Uninspectable Software Packages

This research places AI models within the software supply chain but identifies a distinctive property: model weights cannot be inspected like source code. While organizations already track risks in open-source packages and dependencies, a downloaded model is "a set of weights that resists inspection and tracing," according to the primary source. The attack does not require access to victim data. A surrogate dataset with similar content is sufficient to prepare the backdoor. This lowers the operational barrier: an attacker can poison a public model on a platform like Hugging Face without knowing who will download it or for what task they will customize it. Trend Micro has independently documented this supply chain risk. Out of over one million models on Hugging Face, a JFrog study found 400 containing malicious code. The platform offers open-source models with "minimal vetting," according to the same source. Trend Micro proposes treating AI models as software, viewing behavior as an attack surface and building defenses that go beyond static code scanning. Anthropic has corroborated the feasibility of poisoning at scale. Their research shows that 250 malicious documents are sufficient to inject backdoors into LLMs ranging from 600 million to 13 billion parameters, with success rates independent of model size. The consistency of the required documents—"poisoning attacks require a near-constant number of documents regardless of model and training data size"—indicates that defender scalability does not protect against attacker scalability.

Why It Matters

The dossier does not specify corrective measures implementable today. The defenses proposed in the research—prompt-agnostic checks and cross-task anomaly analysis—remain future research directions rather than available tools. No infrastructural overlaps have emerged linking BadBone to active real-world campaigns; the research is explicitly a laboratory demonstration. The research code has been released publicly under an MIT license with a responsible-use statement. This dualism is significant: scientific reproducibility requires publication, but ease of replication increases the risk of offensive adaptation. The brief does not document whether the responsible-use statement has any binding effect or enforcement mechanisms. The source does not specify if the BadBone mechanism extends to LLMs beyond the tested vision models, nor if variants exist for customization methods other than prompt learning. The effectiveness of the proposed defenses is not quantified; the paper describes the conceptual architecture rather than the experimental results of an implementation.

The Security Verification Time Trap

"A passing grade on these checks comes from the dormant state of the model. The user runs the scan, gets a clean result, customizes the model, deploys it, and the result that looked reassuring covered the period before activation."

This quote from the primary source defines the structural problem. Security is verified at the wrong time. The model passes tests, receives a "passing grade," is customized for a corporate task, and is put into production—only then does the backdoor become activatable. The timeline of due diligence and the timeline of exposure do not overlap. The current security paradigm for AI models assumes that static weight verification is sufficient. BadBone proves that model behavior is a dynamic attack surface, not a static one. The state change induced by customization is the vehicle for activation, not an incidental condition. The consequences for companies downloading and customizing open-source models are concrete. Current pre-acquisition due diligence fails to detect the threat when it is most relevant. The risk manifests after the investment in training and deployment, when the cost of a rollback is at its peak.

Information has been verified against the cited sources and is current as of the time of publication.

The Co-activation Mechanism: Two Conditions, One Window of Exposure

Six Defenses Tested, Six Structural Failures

Supply Chain Risk: Models as Uninspectable Software Packages

Why It Matters

The Security Verification Time Trap

Sources