Burnyard: Local Malware Analysis Beats Cloud on…

Ohio State University's Burnyard project challenges VirusTotal and Sophos Intelix with user-space emulation on local hardware, delivering faster analysis times — 22.41 seconds on Windows, 5.47 on Linux — but no one has verified whether its verdicts match the cloud platforms it benchmarks against.

On June 26, 2026, a team from Ohio State University unveiled Burnyard, a malware analysis system that runs suspicious samples in user-space emulation on local hardware, eliminating uploads to public platforms. The project achieves average times of 22.41 seconds on Windows and 5.47 seconds on Linux, beating documented times for VirusTotal and Sophos Intelix. The trade-off is critical: no one has verified whether Burnyard's verdicts align with those of the cloud platforms it measures itself against.

Key Takeaways

Burnyard emulates instructions one by one without a hypervisor, intercepting system calls and Windows APIs via a custom hook framework on commodity hardware with no network connectivity
Benchmarks on a Dell Optiplex Micro 3050 (Intel 7th-gen i5, 16 GB RAM) show average analysis times 30% faster on Windows and 66% faster on Linux than VirusTotal, and 87–93% faster than Sophos Intelix
The classifier assigns labels across 43 malware families plus a benign class, with differentiated recall: high on families with ample samples (Adware.Neoreklami, WannaCry, CobaltStrike), low on scarce datasets (QNAPCrypt 10 samples, salty 15, REvil 21, RemcosRAT 22)
Emulation is vulnerable to environment detection: malware that checks clocks, missing APIs, or structural anomalies can deactivate, producing partial traces that mask real behavior

The Architecture That Eliminates the Hypervisor

Burnyard operates at the instruction level, replacing the traditional sandbox's hypervisor stack with pure user-space emulation. According to the source, the emulation layer "operates at the instruction level and avoids the hypervisor stack that a sandbox depends on." The system intercepts system calls and Windows APIs through a custom hook framework, generating CSV traces with decoded parameters and return values.

A supervised classifier assigns each sample to one of 44 classes: 43 malware families and one benign class. A transformer-based model adds natural-language behavioral description. Architectural support covers Windows, Linux, and Mach-O binaries across multiple CPU architectures. It requires no host operating system: it uses a supplied root filesystem with libraries, directories, and registry stubs, running entirely on commodity hardware without network connectivity.

The Speed Benchmark: Numbers and Their Limit

Evaluation on 100 samples per operating system, run on a Dell Optiplex Micro 3050 with a 7th-generation Intel i5 and 16 GB of RAM, produced the following average times according to Help Net Security:

For Windows samples, Burnyard averaged 22.41 seconds versus 32.36 seconds for VirusTotal and 182.88 seconds for Sophos Intelix. For Linux samples, the system recorded 5.47 seconds versus 16.27 for VirusTotal and 80.85 for Intelix.

These numbers carry an explicit warning from the source. The comparisons are not direct: VirusTotal aggregates over 70 static analysis engines, while Sophos Intelix spins up dedicated sandboxes with resource provisioning. Burnyard runs a single user-space emulation. The time metric measures different throughput. "Nobody checked Burnyard's verdicts against the ones VirusTotal and Intelix hand back, so we still do not know if all three agree on what a given file is": the correct verdict has not been verified.

"Once a sample reaches a public repository, the person who wrote it can locate it there. Skilled operators watch these platforms for the hashes of their own tools, and a match tells them their campaign has been detected." — Help Net Security

The Privacy Problem: Samples That Alert the Attacker

The stakes motivating the local architecture are documented in the quote above. When a sample is uploaded to public repositories like VirusTotal, its hash becomes searchable. Skilled operators monitor these platforms to detect discovery of their campaigns, with operational consequences: infrastructure rotation, indicator cycling, abandonment of compromised access vectors.

The source explicitly cites "air-gapped sites, government labs, and privacy-sensitive shops" as contexts that need "a way to study malware that keeps the file on a local disk and the whole setup in a closet." Burnyard answers this structural requirement, not a generic preference for local computing.

The Emulator's Gaps: When Malware Realizes It's Being Watched

User-space emulation introduces detection vulnerabilities that traditional hypervisor sandboxes mitigate through higher environmental fidelity. The source describes the mechanism precisely: "A careful piece of malware can sense when it is running inside a stripped-down environment. It watches the clock, it probes for API calls that should exist, and when something feels off, it goes quiet."

The phenomenon is not hypothetical. Burnyard suffers from incomplete system call and API coverage, with consequent risk of binary stalls and traces that represent "a partial picture of what the program actually does." The source explicitly reports that the authors themselves flag this limit: "incomplete coverage of system and API calls can keep a binary from finishing."

Classifier recall reflects this structural imbalance. Families with large training datasets — Adware.Neoreklami, GCleaner, WannaCry, Socks5Systemz, CobaltStrike — achieve high recall. Families with limited samples — QNAPCrypt with 10 samples, salty with 15, REvil with 21, RemcosRAT with 22 — show lower recall. The transformer model for behavioral description does not solve the root problem: if the trace is partial, the description is partial.

Why It Matters

The dossier does not specify Burnyard's release status: public code availability, license, and independent reproducibility are not documented. False positive and false negative rates are not quantified, nor is resistance to advanced anti-emulation techniques beyond those mentioned. Scalability to production volumes (thousands or millions of daily samples) has not been tested.

The brief does not document specific mitigations for users of existing cloud platforms, nor operational recommendations for transitioning to local solutions. The source does not establish whether local analysis is preferable to cloud analysis in absolute terms: it documents a trade-off, not a hierarchy.

The question the project raises for the industry concerns benchmarking itself. Speed as a primary metric, without verification of verdict accuracy, produces technically correct but operationally incomplete comparisons. Burnyard demonstrates that user-space emulation is feasible on commodity hardware with competitive performance; it does not demonstrate that it produces correct verdicts with frequency comparable to cloud platforms.

For analysts in sensitive environments — government, defense, healthcare, critical infrastructure — the privacy-vs-efficacy dilemma remains open. Burnyard offers a concrete architectural alternative, but its creator admits the decisive test, the one on the verdict, has not yet been run.

Sources

Information is based on the cited source and current as of publication.

Sources

Sources and references