Apple Intelligence AI Guardrails Bypassed in New Attack

"The first is Neural Execs, a known prompt injection attack that uses 'gibberish' inputs to trick the AI into executing arbitrary, attacker-defined tasks. These inputs act as universal triggers that do not need to be remade for different payloads."

"The second method, used by the RSAC researchers to bypass input and output filters, is Unicode manipulation. By writing malicious output text backward and using the Unicode right-to-left-override function, they were able to bypass content restrictions."

"Essentially, we encoded the malicious/offensive English-language output text by writing it backwards and using our Unicode hack to force the LLM to render it correctly."

"Combining the two methods can allow attackers to force the local Apple Intelligence LLM to produce offensive content or, more critically, manipulate private data and functionality within third-party applications."

RSAC researchers have discovered methods to bypass the safety protocols of Apple's Intelligence AI, which integrates generative AI with personal context on Apple devices. They utilized two techniques: Neural Execs, a prompt injection attack that tricks the AI into executing arbitrary tasks, and Unicode manipulation, which involves writing malicious output text backward to bypass content restrictions. This combination enables attackers to produce offensive content and manipulate private data within third-party applications, raising significant security concerns regarding the AI's functionality.

#apple-intelligence #ai-safety #cybersecurity #adversarial-techniques #data-manipulation

Read at SecurityWeek

Unable to calculate read time

Collection

[

...

]

Apple Intelligence AI Guardrails Bypassed in New AttackApple Intelligence AI Guardrails Bypassed in New Attack Briefly

Apple Intelligence AI Guardrails Bypassed in New Attack
Apple Intelligence AI Guardrails Bypassed in New Attack
Briefly