Ollama Flaws Expose Local LLM Memory and Enable Windows Malware Persistence
Three critical CVEs in Ollama allow unauthenticated remote attackers to leak LLM process memory via crafted GGUF files and achieve persistence on Windows syste…

On May 18, 2026, details emerged regarding three critical vulnerabilities in Ollama, the leading open-source tool for running local large language models. A crafted GGUF file allows unauthenticated remote attackers to read a process's entire memory, while two flaws in the Windows updater facilitate persistent code execution. This combination of attack vectors puts over 300,000 often public and unauthenticated servers at risk, effectively turning the AI model from an asset into an exfiltration vector.
- CVE-2026-7482 (CVSS 9.1): A heap out-of-bounds read in Ollama’s GGUF loader (prior to version 0.17.1) triggered by tensor offsets and sizes exceeding actual buffer lengths.
- The exfiltration of API keys, environment variables, and active conversations is performed via the
/api/pushendpoint to an attacker-controlled registry. - CVE-2026-42248 and CVE-2026-42249 (CVSS 7.7 each) affect the Ollama Windows updater in versions 0.12.10 through 0.17.5.
- A combination of path traversal and a lack of digital signature verification allows attackers to write arbitrary executables to the user's Startup folder, ensuring execution at the next login.
The GGUF Loader and the Heap Out-of-Bounds Memory Leak
The most impactful vulnerability, tracked as CVE-2026-7482, affects all Ollama versions prior to 0.17.1. The flaw resides in the parser for the GGUF format—the standard binary container for model weights—specifically within the WriteTo() routine found in fs/ggml/gguf.go and server/quantization.go. When a GGUF file contains tensor offsets and sizes that exceed the allocated buffer, the server reads past the heap boundaries, triggering an out-of-bounds read that exposes adjacent process memory.
The severity is underscored by a CVSS score of 9.1, reflecting the ease of exploitation and the lack of authentication requirements. According to CVE.org, the server performs reads beyond the allocated heap buffer during the quantization phase, enabling the silent dumping of sensitive information without causing crashes or obvious log anomalies. Researchers at Cyera have dubbed this attack chain "Bleeding Llama," highlighting how the model format itself has become a dangerous entry point.
From Memory to Registry: How /api/push Transports Stolen Data
Data leaked from memory does not remain dormant on the server. An attacker can package leaked information into a model artifact and upload it to an external registry using the /api/push endpoint. This mechanism converts a memory read into an active exfiltration channel, where API keys, proprietary code, system prompts, and user conversations leave the infrastructure disguised as legitimate model weights.
This risk is particularly insidious due to the nature of the traffic: pushing a model to an external registry is less likely to trigger alarms than a classic C2 (Command & Control) connection, blending into the normal operational flow of downloading and redistributing weights.
Emphasizing the scale of the risk, Cyera researcher Dor Attias stated that an aggressor can learn nearly any sensitive element related to AI inference: “An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts, and much more.” The lack of default authentication on many internet-exposed installations exponentially increases the attack surface, turning the server into a passive collection point for anyone capable of uploading a crafted GGUF file.
The Windows Updater and Guaranteed Persistence in the Startup Folder
"The path traversal writes attacker-chosen executables into the Windows Startup folder. The missing signature verification keeps them there: the post-write cleanup that would remove unsigned files on a working updater is a no-op on Windows. On the next login, Windows runs whatever was left behind." Bartłomiej Dmitruk, co-founder of Striga
Parallel to the loader flaw, Ollama's Windows client suffers from two concurrent vulnerabilities tracked as CVE-2026-42248 and CVE-2026-42249, both carrying a CVSS score of 7.7. Coordinated disclosure was managed by CERT Polska. These flaws impact the automatic updater in versions 0.12.10 through 0.17.5. A path traversal vulnerability allows arbitrary executables to be written to directories outside the intended path, while a failure to verify digital signatures prevents the post-update cleanup mechanism from removing unsigned binaries.
The practical consequence is long-term compromise. As explained by Striga co-founder Bartłomiej Dmitruk, the path traversal deposits attacker-chosen executables into the user’s Startup folder. The absence of signature verification halts automatic cleanup, turning the updater into a vehicle for persistence. Upon the next Windows login, the system silently executes whatever was left in that directory without requiring further victim interaction.
300,000+ Exposed Servers: Local AI as an Attack Surface
The massive adoption of Ollama—boasting over 171,000 stars and 16,100 forks on GitHub—has created a new infrastructure category that experts call the "IoT of languages." Inference servers scattered across clouds and offices, often deployed by developers for rapid testing and left in production without authentication, expose HTTP endpoints to the internet. In this context, the GGUF parser vulnerability is not a niche bug; it is a direct gateway to the memory of processes handling sensitive corporate data.
Cyera estimates that over 300,000 servers globally are potentially impacted, a figure that remains independently unverified and should be approached with caution. Furthermore, active exploitation in the wild has not been confirmed for CVE-2026-7482, and there is currently no confirmed release date for patches regarding the Windows vulnerabilities. These caveats do not diminish the structural risk: the combination of a fragile binary parser, public unauthenticated endpoints, and update mechanisms lacking signature verification creates a concrete threat profile for enterprises.
Recommended Mitigation
- Update Ollama to version 0.17.1 or later to resolve CVE-2026-7482 in the GGUF loader, which is currently the only available fix.
- Isolate instances from the public internet by removing external endpoint exposure, specifically disabling access to
/api/pushwhere it is not strictly required. - Enable authentication and network segmentation to prevent unauthorized model uploads to inference servers, thereby reducing the initial attack surface.
- Inspect the Startup folder on Windows clients for unsigned executables and monitor for patch releases for versions 0.12.10-0.17.5, as a definitive fix date for CVE-2026-42248 and CVE-2026-42249 has not been announced.
This incident confirms that local AI infrastructure is not a secure island. When the model itself becomes untrusted input, every perimeter assumption fails. For companies that have pushed Ollama into production without authentication, the bill is arriving in the form of leaked memory and guaranteed persistence.
Frequently Asked Questions
How can a GGUF model file leak server memory?
GGUF files contain metadata describing tensor offsets and sizes. If these values exceed the actual length of the allocated buffer, the quantization routine in server/quantization.go (WriteTo()) continues reading beyond the heap boundaries, spilling contiguous bytes that may include API keys, environment variables, and conversation fragments.
What is the specific risk for Ollama users on Windows?
Versions 0.12.10-0.17.5 are vulnerable to path traversal and signature verification failures within the updater. An attacker can write arbitrary executables to the user's Startup folder; upon the next login, Windows executes them automatically, ensuring persistence without requiring the user to manually run a new file.
Why does a lack of authentication make this more dangerous?
Many Ollama installations are exposed directly to the internet for ease of access. Without default authentication, anyone can upload a crafted GGUF model to trigger the leak chain or interact with the compromised updater, transforming the server into a passive source for data exfiltration.
Information has been verified against cited sources and is current as of the time of publication.
Sources
- https://www.wired.it/article/nella-profondita-oceano-pacifico-faglia-genera-terremoti-puntualita-sconcertante-ora-sappiamo-perche/
- https://www.schneier.com/blog/archives/2026/05/copy-fail-linux-vulnerability.html
- https://thehackernews.com/2026/05/ollama-out-of-bounds-read-vulnerability.html
- https://krebsonsecurity.com/2026/05/patch-tuesday-may-2026-edition/
- https://therecord.media/cisa-orders-all-federal-agencies-to-patch-cisco-sd-wan-bug