
Security researchers from Trail of Bits have revealed a novel technique that exploits image downscaling to perform stealthy prompt injection attacks on production AI systems, enabling data exfiltration from platforms like Google’s Gemini CLI, Vertex AI Studio, Genspark, and even Google Assistant.
The vulnerability stems from how AI systems process high-resolution images, which is by downscaling them before passing them to large language models (LLMs), allowing attackers to hide malicious instructions invisible to the user at full resolution.
Trail of Bits researchers uncovered that by exploiting specific image resampling algorithms, such as bicubic, bilinear, and nearest neighbor interpolation, malicious prompt injections could be embedded in images in a way that only becomes effective when those images are scaled down. This behavior creates a dangerous mismatch between what the user sees and what the model actually interprets.

Trail of Bits
The team demonstrated successful attacks on several Google-backed systems. Most notably, on the Gemini CLI, they leveraged a default configuration in the Zapier MCP server, which auto-approves tool calls due to a trust=True setting in settings.json. A user uploading a benign-looking image unknowingly triggered an exfiltration of calendar data via Zapier without ever seeing the downscaled, malicious variant of the image that the model processed.

Other platforms affected include Vertex AI Studio, where the front-end UI only displays the high-resolution image, hiding the altered downscaled version seen by the model, Google Assistant on Android phones, Genspark, and Gemini’s web and API interfaces (via llm CLI).

Trail of Bits highlights that this is not an isolated design flaw but a systemic issue across multi-modal, agentic AI platforms. These systems, which allow LLMs to autonomously take actions via tools or APIs, have historically been susceptible to prompt injection attacks, especially when lacking secure default settings or rigorous user confirmation protocols.
To enable and test these attacks, Trail of Bits developed Anamorpher, an open-source tool that helps craft adversarial images targeting specific downscaling implementations. Anamorpher uses visual test suites, including checkerboards, Moiré patterns, and slanted edges, to fingerprint the image scaling behavior of libraries like Pillow, OpenCV, TensorFlow, and PyTorch. This fingerprinting enables attackers (and defenders) to tailor payloads that reliably trigger when images are downscaled via known algorithms.
Technically, the attack takes advantage of how resampling works. For instance, bicubic interpolation considers a 4×4 grid of pixels and uses polynomial weighting to determine the final pixel value. By carefully adjusting high-resolution pixels, especially in dark regions of an image, the researchers manipulated brightness (luma) and color to reveal hidden text or payloads only after scaling. Anamorpher supports reverse-engineering this process and generating adversarial examples with optimized visual contrast and hidden commands.
To defend against this emerging threat, Trail of Bits recommends the following:
- Avoid image downscaling entirely: Limit upload dimensions instead of transforming inputs.
- Display processed inputs to users: Always preview what the model will actually see, especially in CLI and API tools.
- Enforce user confirmation for sensitive actions: Prevent automatic tool calls from being initiated by embedded prompts, even if they appear inside images.
- Adopt secure-by-design patterns: Agentic systems must require explicit user oversight and minimize tool invocation scope.
Trail of Bits also warns that mobile and edge devices, which often enforce strict image sizing and use less robust resampling techniques, may be even more susceptible.
Anamorpher is currently in beta and available for public testing, with ongoing development planned to support broader use cases and fingerprinting techniques.
Leave a Reply