Ollama Vulnerability: CVE-2026-7482 Risks Memory Exposure for 300,000 AI Servers

A critical heap out-of-bounds read vulnerability in Ollama (CVE-2026-7482) allows for memory leakage via GGUF files, putting API keys and private conversations…

On May 18, 2026, researchers disclosed a severe vulnerability in Ollama, one of the most widely used open-source frameworks for running local large language models. Identified as CVE-2026-7482 with a CVSS score of 9.1, the flaw is a heap out-of-bounds read located within the GGUF model loader.

Technical analysis indicates that an unauthenticated remote attacker can exploit this weakness to read the entire memory of the server process. The leak occurs when a specially crafted, malicious GGUF file is uploaded to the /api/create endpoint. The scope of the risk is significant, as it enables the exfiltration of sensitive data, including API keys, environment variables, and real-time user interactions.

Beyond this primary flaw, the disclosure raised further security concerns regarding the framework. Two additional vulnerabilities in the Windows update mechanism were identified (CVE-2026-42248 and CVE-2026-42249). While these carry a lower CVSS score of 7.7, they remain unpatched and could allow for silent, persistent remote code execution (RCE).

Key Vulnerability Insights

Critical Flaw: CVE-2026-7482 is a heap out-of-bounds read affecting Ollama versions prior to 0.17.1 during GGUF file loading.
Attack Vector: Uploading a manipulated model to the /api/create endpoint triggers a read beyond the allocated heap buffer.
Global Exposure: Security firm Cyera estimates that approximately 300,000 Ollama servers are potentially exposed due to the lack of native authentication in the REST APIs.
Windows Risks: CVE-2026-42248 and CVE-2026-42249 allow for persistent RCE via the updater. Coordinated disclosure with CERT Polska began in January 2026, but the flaws remain unpatched.

Technical Analysis: Exploiting the GGUF Loader for Memory Leaks

The core of the issue lies in how Ollama handles GGUF files, a binary format optimized for quantized models. The vulnerability is located in the server/quantization.go component, specifically within the WriteTo() function. During model loading, the server processes tensors contained in the file to manage memory quantization.

An attacker can manipulate tensor offsets and dimensions within a GGUF file. When Ollama processes this file via the /api/create endpoint, the server fails to properly validate the allocated memory boundaries. This oversight allows the process to read data positioned beyond the intended heap buffer, exposing bytes belonging to other segments of the running process.

According to the CVE.org database and evidence from Cyera researchers, the defect allows for arbitrary reading of adjacent data. Because Ollama manages multiple requests and configurations within the same memory space, a single malicious file can serve as a key to access critical data from other users or the system itself.

“An attacker can learn virtually anything about an organization from its AI inference: API keys, proprietary code, customer contracts, and much more.”
— Dor Attias, Security Researcher at Cyera

From Memory Leak to Exfiltration: API Endpoint Abuse

CVE-2026-7482 is not limited to passive memory reading; it can be integrated into a more sophisticated attack chain. Once data is read beyond buffer limits via the /api/create endpoint, an attacker can leverage a second legitimate endpoint: /api/push.

Under normal circumstances, /api/push is used to upload models to remote registries. However, if configured to point to an attacker-controlled registry, this endpoint can be used to exfiltrate the leaked sensitive data. Combining these two API calls transforms a memory management error into a potent tool for active, difficult-to-detect data theft.

The risk is compounded by the fact that Ollama’s REST APIs lack built-in authentication by default. If a server is network-accessible without external protections (such as a reverse proxy or firewall), any party can send the requests necessary to trigger the leak. This open configuration is the primary driver behind the estimate of 300,000 vulnerable instances worldwide.

Residual Vulnerabilities: Persistent Risk on Windows Systems

While a patch has been issued for the GGUF flaw, the situation remains critical for Windows users. Two distinct vulnerabilities in the updater, CVE-2026-42248 and CVE-2026-42249, allow an attacker to achieve remote code execution (RCE) with persistence on the victim’s system.

Identified by researchers at Striga, these issues affect versions 0.12.10 through 0.22.0. By utilizing path traversal techniques and exploiting the lack of digital signature verification on updates, an attacker can force the system to write arbitrary files to the user's Startup folder. Upon the next login or reboot, the malicious code executes automatically with the current user's privileges.

Bartłomiej Dmitruk, co-founder of Striga, noted that this attack chain results in silent and persistent execution. Despite coordinated disclosure with CERT Polska beginning in late January 2026, no official patches for these Windows flaws were available at the time of publication, leaving thousands of workstations exposed.

Mitigation and Response

Addressing these vulnerabilities requires a multi-layered security approach that extends beyond simple software updates. The following actions are recommended to mitigate the risks:

Immediate Update: Install Ollama version 0.17.1 or higher. This fixes the critical CVE-2026-7482 flaw in the GGUF loader and prevents memory leaks via manipulated models.
Implement Authentication: Since Ollama does not offer native API authentication, servers must be placed behind a reverse proxy (such as Nginx or Apache) requiring strong credentials or access tokens for all /api requests.
Rotate Secrets: If an Ollama server was exposed to the internet or unsecured networks, all API keys, environment variables, and secrets managed by the framework should be considered compromised and rotated immediately.
Monitor Windows Environments: In the absence of an updater patch (for versions up to v0.22.0), administrators should actively monitor Startup folders and Ollama update processes, while limiting the privileges of the user running the application.

Local Inference Demands a Modern Security Posture

The discovery of CVE-2026-7482 represents a turning point in how the security of local AI models is perceived. The assumption that running an LLM "locally" is inherently safer than using cloud services is challenged by the complexity of input formats like GGUF. A binary format is not merely a container for neural weights; it is code interpreted by a server and can contain malicious instructions.

This vulnerability demonstrates that AI frameworks must mature rapidly, adopting enterprise-grade security standards such as mandatory authentication and rigorous memory management. Furthermore, the delay in Windows patching highlights the need for organizations to implement defense-in-depth strategies rather than relying solely on vendor response times.

In conclusion, protecting AI infrastructure requires granular API access control and constant validation of loaded artifacts. Without these safeguards, the flexibility of local models may inadvertently provide an open door to the heart of corporate data.

The information in this article has been verified through technical analyses by Cyera and Striga, and reports from CERT Polska available at the time of publication.

Information is current and verified as of the publication date.

Technical Analysis: Exploiting the GGUF Loader for Memory Leaks

From Memory Leak to Exfiltration: API Endpoint Abuse

Residual Vulnerabilities: Persistent Risk on Windows Systems

Mitigation and Response

Local Inference Demands a Modern Security Posture

Sources