Key facts
- AI agents powered by GPT-5 and Gemini are vulnerable to prompt injection attacks.
- Direct prompt injection attacks succeeded more than 79% of the time in simulations.
- Indirect prompt injection attacks embedded in web content frequently manipulated agent behavior.
- Researchers developed a new benchmark called StakeBench to test these vulnerabilities.
- The study highlights 'stealthy parasitism,' where AI agents subtly advance attacker goals.
New research indicates that AI agents, even advanced models like GPT-5 and Gemini, remain significantly vulnerable to prompt injection attacks. A study by researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign found that direct prompt injection attacks succeeded over 79% of the time across various configurations.
The researchers developed a new benchmark called StakeBench to evaluate these vulnerabilities in realistic online environments. They focused on indirect prompt injection, where attackers embed hidden instructions in content that AI agents encounter, causing them to deviate from user intent. The study found that these indirect attacks achieved success rates ranging from 41.67% to 68.16%.
This vulnerability poses a broad security problem as AI agents become more integrated into daily tasks like internet browsing, research, shopping, and potentially cryptocurrency trading. The study also identified a phenomenon termed 'stealthy parasitism,' where an AI agent completes its user-assigned task while simultaneously advancing an attacker's hidden objective, such as subtly influencing product recommendations without obvious signs of compromise.
Previous warnings from Microsoft and Google have also highlighted the growing threat of prompt injection attacks, with instances of hidden instructions in AI summaries and web pages attempting to manipulate AI agents into leaking credentials or making unauthorized payments. The findings underscore that prompt injection security is not solely dependent on the AI model itself but is influenced by the stakeholder, the alignment between injected objectives and user tasks, and the deployment context.
