"Bleeding Llama": Critical Ollama Vulnerability Exposes Over 300,000 AI Servers to Memory Leaks
What Is Ollama?
Ollama is a popular open-source framework that enables developers and organizations to run AI language models locally rather than relying on cloud services. The project has over 171,000 stars on GitHub, making it one of the most widely adopted tools in the self-hosted AI space.
How the Vulnerability Works
The flaw is an out-of-bounds heap read located in Ollama's GGUF model loader specifically in a function called WriteTo() within the model creation pipeline. GGUF is the standard file format used to store and load large language models locally.
The attack is straightforward: an attacker sends a specially crafted GGUF file to an exposed Ollama server, with the tensor dimensions set to a deliberately oversized value. When Ollama processes the file through its /api/create endpoint during model creation, the server reads beyond the boundaries of its allocated memory buffer leaking whatever happens to reside in the surrounding heap memory.
The exploitation chain unfolds in three steps. First, the attacker uploads the malicious GGUF file. Second, they trigger model creation via the /api/create endpoint, activating the out-of-bounds read. Third, they use Ollama's /api/push endpoint to exfiltrate the leaked memory contents to an attacker-controlled model registry.
What Data Is at Risk?
The data exposed through this vulnerability is particularly sensitive, since Ollama sits at the core of many organizations' AI infrastructure. Leaked memory can contain environment variables, API keys, system prompts, and the full conversation data of users currently interacting with the server.
As Cyera security researcher Dor Attias noted, attackers can essentially learn everything about an organization's AI inference pipeline, including proprietary code, customer contracts, and authentication credentials. In environments where Ollama is connected to agentic tools like Claude Code, the impact is even broader, since all tool outputs and intermediate results flow through the server and may end up exposed.
The Fix
The vulnerability was addressed in Ollama version 0.17.1. All users running earlier versions should update immediately. Additionally, organizations are strongly advised to place Ollama instances behind a firewall or authentication proxy, since the REST API does not include built-in authentication. Publicly exposing an Ollama instance to the internet without any access controls is particularly dangerous given this flaw.
Bonus: Two Unpatched Windows Flaws Enable Persistent Code Execution
Separately, researchers at Striga disclosed two additional vulnerabilities in Ollama's Windows auto-update mechanism that, when chained together, allow an attacker to achieve persistent code execution on victim machines at every login. These flaws CVE-2026-42248 and CVE-2026-42249, both scoring 7.7 remain unpatched as of the time of writing, following the expiration of a 90-day responsible disclosure period.
The first flaw is a missing signature verification check the Windows updater installs update binaries without verifying their authenticity, unlike the macOS version. The second is a path traversal vulnerability that allows an attacker who controls an update server to redirect where the installer is written on disk, including into the Windows Startup folder.
Combined, these two flaws allow an attacker to silently drop a malicious executable into the Startup folder, which is then executed automatically on every subsequent user login without triggering any signature check. Realistic payloads include reverse shells, credential stealers, and droppers for additional malware.
Affected versions span Ollama for Windows 0.12.10 through 0.17.5 for the path traversal, and up to 0.22.0 for the signature verification issue. As no patch is currently available, users are advised to disable automatic updates in Ollama for Windows and manually remove any Ollama shortcut from the Windows Startup folder as a temporary mitigation.
Comments
Post a Comment