Researchers Detail How a Prompt Injection Attack Bypassed Apple’s AI Protections

Publication Date: 10.04.2026

Rate this article:

4.8/5 ( 56 votes )

Table of Contents:

A resolved issue allowed researchers to bypass Apple’s restrictions and force the LLM on the device to perform actions under the attacker’s control. Here’s how they did it.

Apple Strengthened Its Defenses Against This Attack

Two blog posts published today on the RSAC blog (1, 2) (AppleInsider via), detail how researchers combined two attack strategies to force Apple’s model on the device to execute instructions under the attacker’s control.

Interestingly, the researchers successfully carried out this exploitation without being 100% certain about how Apple handled part of the input and output filtering process of its local model, as Apple does not disclose the full details of its models' inner workings for security reasons.

Still, the researchers indicate that they have a pretty good idea of what is happening under the hood.

According to them, the most likely scenario is that after a user sends a prompt to Apple’s model on the device via an API call, an input filter ensures that the request does not contain unsafe content.

If this is the case, the API fails. Otherwise, the request is passed to the model on the actual device, and that model responds to an output filter that checks whether its output contains unsafe content; this can cause the API to fail or pass, depending on what it finds.

Image: Rotten Apples: Technical Details of RSAC’s Successful Apple AI Prompt Injection Attack

How They Did It

With this in mind, the researchers found that they could combine two exploitation techniques to ensure that Apple’s model ignored its core security directives while also convincing the input and output filters to pass malicious content.

First, they reversed the malicious string, then used the Unicode RIGHT-TO-LEFT OVERRIDE character to ensure it displayed correctly on the user’s screen, while keeping it reversed in the raw input and output that the filters would examine.

Next, the researchers embedded the reversed malicious string into a second attack method called Neural Exec, which is a complex way to override the model’s instructions with new instructions that an attacker might want to execute.

Image: Do You Have a Bad Apple in Your Pocket? We Used Prompt Injection to Take Over Apple AI

As a result, the Unicode attack managed to bypass the input and output filters, while Neural Exec caused Apple’s model to behave maliciously.

To evaluate the effectiveness of the attack, we prepare three different pools to generate appropriate input prompts:

System prompts: A series of system prompts/tasks (e.g., “Make the given text conform to American English spelling and punctuation rules”).

Malicious strings: Manually crafted strings designed to be offensive or harmful (i.e., the outputs we want to force the model to produce).

Innocuous inputs: Paragraphs taken from random Wikipedia articles, used to simulate non-threatening, innocent-looking inputs (e.g., in the context of indirect prompt injection via RAG or similar systems).

During the evaluation, we randomly sample one item from each pool, create a complete prompt, generate a payload (see below), inject it, and test whether the attack was successful by running the model on Apple’s device.

In their tests, the attackers achieved a 76% success rate over 100 random prompts.

They reported the attack to Apple in October 2025, and the company "strengthened the affected systems against this attack, and these protections were implemented in iOS 26.4 and macOS 26.4."

To read the full report, follow this link, which also contains a link to the technical aspects of the attack.

Worth Checking on Amazon

David Pogue – 'Apple: The First 50 Years'
MacBook Neo
Logitech MX Master 4
AirPods Pro 3
AirTag (2nd Generation) – 4 Pack
Apple Watch Series 11
Wireless CarPlay Adapter

Comments

(5 Comments)

EY

Efsun Yıldırım

It is really interesting to understand how such attacks occur. Apple's security measures have always been strong, but this situation makes me think.
MÇ

Metehan Çelik

It is great that researchers can conduct such a deep analysis. However, sharing such information may pose security risks.
ZS

Zeynep Sönmez

The breach of Apple's security systems worries me. What will happen to the security of user data?
KY

Kübra Yalçın

After reading this article, my trust in Apple has been shaken. Are other tech giants facing similar issues?
YK

Yusuf Korkmaz

The details of the attacks are very interesting, but there is also the possibility that such information could be used by malicious individuals. Caution is needed!