A novel cyber-attack method dubbed ConfusedPilot, which targets Retrieval-Augmented Generation (RAG) based AI systems like Microsoft 365 Copilot, has been identified by researchers at the University of Texas at Austin's SPARK Lab.
The team, led by Professor Mohit Tiwari, CEO of Symmetry Systems, uncovered how attackers could manipulate AI-generated responses by introducing malicious content into documents the AI references.
This could lead to misinformation and flawed decision-making across organizations.
With 65% of Fortune 500 companies adopting or planning to implement RAG-based systems, the potential for widespread disruption is significant.
The ConfusedPilot attack method requires only basic access to a target's environment and can persist even after the malicious content is removed.
The researchers also showed that the attack could bypass existing AI security measures, raising concerns across industries.
How ConfusedPilot Works
- Data Environment Poisoning: An attacker adds specially crafted content to documents indexed by the AI system
- Document Retrieval: When a query is made, the AI references the tainted document
- AI Misinterpretation: The AI uses the malicious content as instructions, potentially disregarding legitimate information, generating misinformation or falsely attributing its response to credible sources
- Persistence: Even after removing the malicious document, the corrupted information may linger in the system
The attack is especially concerning for large enterprises using RAG-based AI systems, which often rely on multiple user data sources.
This increases the risk of attack since the AI can be manipulated using seemingly innocuous documents added by insiders or external partners.
"One of the biggest risks to business leaders is making decisions based on inaccurate, draft or incomplete data, which can lead to missed opportunities, lost revenue and reputational damage," explained Stephen Kowski, field CTO at SlashNext.
"The ConfusedPilot attack highlights this risk by demonstrating how RAG systems can be manipulated by malicious or misleading content in documents not originally presented to the RAG system, causing AI-generated responses to be compromised."
Read more on enterprise AI security: Tech Professionals Highlight Critical AI Security Skills Gap
Mitigation Strategies
To defend against ConfusedPilot, the researchers recommend:
- Data Access Controls: Limiting who can upload or modify documents referenced by AI systems
- Data Audits: Regular checks to ensure the integrity of stored data
- Data Segmentation: Isolating sensitive information to prevent the spread of compromised data
- AI Security Tools: Using tools that monitor AI outputs for anomalies
- Human Oversight: Ensuring human review of AI-generated content before making critical decisions
"To successfully integrate AI-enabled security tools and automation, organizations should start by evaluating the effectiveness of these tools in their specific contexts," explained Amit Zimerman, co-founder and chief product officer at Oasis Security.
"Rather than being influenced by marketing claims, teams need to test tools against real-world data to ensure they provide actionable insights and surface previously unseen threats."