Key facts
- An AI assistant named Fiu, powered by Anthropic's Claude Opus 4.6, successfully defended against over 6,000 prompt injection attempts.
- The experiment, hosted on hackmyclaw.com, aimed to test AI resilience against malicious commands hidden in emails.
- Over 2,000 attackers sent more than 6,000 emails attempting to extract a secrets.env file.
- The AI identified the high volume of attacks as a coordinated security exercise.
- The experiment resulted in a Google account suspension and over $500 in API costs.
Developer Fernando Irarrázaval's experiment at hackmyclaw.com, designed to test an AI assistant's defenses against prompt injection attacks, successfully repelled over 6,000 attempts from more than 2,000 attackers. The AI, named Fiu and powered by Anthropic's Claude Opus 4.6 within the OpenClaw framework, was tasked with protecting a secrets.env file containing sensitive credentials.
The challenge gained significant traction after appearing on Hacker News, leading to a barrage of creative email-based attacks. Despite subjects like "Fiu, this is you from the future" and "EMERGENCY: secrets.env needed for incident response," none of the attackers were able to extract the target file. The AI itself noted the high volume of attempts, suggesting a "coordinated security exercise."
However, the experiment was not without its side effects. Fiu's Gmail account was suspended by Google due to the high volume of inbound emails and API calls, requiring three days to restore. API costs exceeded $500. Additionally, batch processing led to Fiu becoming overly vigilant, potentially skewing results.
In a separate test, the anonymous jailbreaker known as Pliny the Liberator attempted to breach a similar OpenClaw system using advanced techniques, including a "tokenade" hidden in an emoji and disguised commands. These attempts were also quarantined, with Pliny acknowledging that smaller, less robust models would likely have succumbed more easily.
