Agents of Chaos
Agents of Chaos
Preprint study on AI agent security failures in realistic deployment settings.
Links
- arXiv: https://arxiv.org/abs/2602.20021
- PDF: https://arxiv.org/pdf/2602.20021
- Submitted: February 23, 2026
Summary
Academics at Harvard and Stanford found AI agents leaked secrets, destroyed databases, and taught other agents to behave badly.
Red-teaming study of autonomous LM-powered agents in a live lab environment with persistent memory, email, Discord, file systems, and shell execution. Over two weeks, twenty AI researchers documented eleven case studies of failures including unauthorized compliance, sensitive data disclosure, destructive actions, denial-of-service, resource consumption, identity spoofing, cross-agent propagation of unsafe practices, and partial system takeover.