Bleeding Llama: Ollama GGUF Loader Bug Exposes Process Memory Across 300,000 Servers

A critical unauthenticated heap out-of-bounds read vulnerability in Ollama (CVE-2026-7482) allows attackers to exfiltrate sensitive process memory, including A…

May 10, 2026. The Cyera research team has disclosed CVE-2026-7482, a vulnerability with a CVSS score of 9.1 in the open-source Ollama framework. The flaw allows an unauthenticated remote attacker to leak the entire memory heap of the process. Dubbed "Bleeding Llama," the vulnerability resides in the GGUF format loader and leverages Go’s unsafe package to read beyond allocated buffer boundaries.

With an estimated 300,000 servers exposed to the internet, the risk exceeds simple information disclosure; it represents a total compromise of the operational context for organizations hosting AI models on-premise or in accessible cloud environments.

Key Takeaways

The vulnerability is a heap out-of-bounds read (CVE-2026-7482, CVSS 9.1) located in Ollama's GGUF model loader, specifically within the WriteTo() function.
Exploitation requires uploading a specially crafted GGUF file to the /api/create endpoint and exfiltrating data via /api/push to an attacker-controlled registry.
Leaked data may include environment variables, API keys, system prompts, and active user conversations held in the process memory.
Ollama has patched the vulnerability in version 0.17.1; it remains unclear how many instances were compromised prior to the release.

GGUF Loader Mechanism and the Exploit Path

According to technical details published by Cyera, the vulnerability exists in the WriteTo() function of Ollama’s quantization path, which handles GGUF model loading. When the server processes a user-supplied file, the code calls ggml.ConvertToF32 using an element count derived from the tensor shape declared in the GGUF file. However, it fails to verify that this count matches the actual size of the allocated buffer. Consequently, the process accesses bytes beyond the buffer limit, pulling arbitrary fragments of the server's memory space into the response.

The attack does not require authentication. An attacker can upload a crafted GGUF file via the network-exposed /api/create endpoint and subsequently "push" it to a remote registry under their control via /api/push. This process dumps the out-of-bounds memory data into the response. Because Ollama’s REST API does not natively enforce an authentication layer, the attack surface is broad and immediately accessible for any internet-facing instance.

How Go’s Unsafe Package Amplifies the Flaw

The architectural choice enabling this exploit is the use of Go’s unsafe package. This package provides developers with direct memory management access, bypassing the language's inherent safety guarantees. As Cyera noted, Ollama utilizes unsafe in critical areas of tensor loading, precisely where memory boundary validation is most likely to fail.

The combination of tensor shape calculations derived from unsanitized user input and low-level access primitives removes the barriers between the loaded model and the process heap. In this environment, a parsing error becomes a transparent window into sensitive memory, where API keys and secrets often reside in plain text during inference.

Data at Risk: From API Keys to Proprietary Conversations

The leaked memory contains more than just service metadata; it holds high-value information including environment variables, API keys, system prompts, and active user conversations occurring at the time of the attack. Cyera researcher Dor Attias emphasized that compromising the inference process exposes an organization's operational core.

"An attacker can learn basically anything about the organization from your AI inference — API keys, proprietary code, customer contracts, and much more" — Dor Attias, Cyera security researcher

In enterprise settings where Ollama is deployed to isolate proprietary data, the ability to extract customer contracts, source code, and cloud credentials via a single altered model file constitutes a structural breach. Persistent malware or privilege escalation is unnecessary; direct heap access is sufficient to compromise compliance, trade secrets, and the reliability of self-hosted AI infrastructure.

Mitigation and Response

Organizations running Ollama on internet-accessible servers should prioritize four immediate actions. First, update to version 0.17.1, which contains the fix for the primary vulnerability. Second, verify that the /api/create and /api/push endpoints are not publicly exposed without a firewall, corporate VPN, or additional perimeter authentication, as the native API lacks built-in credentials.

Third, security teams should rotate API keys, secrets, and credentials stored in environment variables for any Ollama instances that were exposed during the risk period, regardless of whether the patch has been applied. Fourth, administrators should analyze access logs for unauthorized GGUF file uploads or suspicious connections to the /api/push endpoint, which may indicate prior exfiltration attempts.

Frequently Asked Questions

Can this vulnerability be exploited without victim interaction?

Yes. The attack is remote and unauthenticated. An attacker simply sends a crafted GGUF file to the /api/create endpoint and retrieves leaked data by pushing it to a registry under their control via /api/push.

Does the 0.17.1 patch fully resolve the data leakage risk?

Version 0.17.1 addresses CVE-2026-7482. However, it is not yet confirmed if the fix introduces regressions or if alternative exploit scenarios exist. Organizations are strongly advised to rotate potentially exposed secrets as a precaution.

Why is this classified as an out-of-bounds read rather than remote code execution (RCE)?

CVE-2026-7482 allows for arbitrary reading of process memory (information disclosure) but does not provide a mechanism to inject or execute arbitrary code on the server. The impact is focused on data exfiltration rather than system takeover.

Cyera’s discovery underscores that self-hosted AI infrastructure is not a guaranteed safe haven. A single modified model file can transform a local server into an open data source. The issue stems from a lack of tensor shape validation in a critical path involving memory management. For organizations that have moved LLMs on-premise for privacy, this serves as a reminder: the security of the model loader is the security of the data itself.

Sources

Information has been verified against cited sources and is current at the time of publication.