AI Assistant Withstands 6,000 Prompt Injection Attacks

Created at 26 Jun · 6:06 PM1 source↑ Market-relevant

IN SHORT

An AI assistant named Fiu, running on the OpenClaw framework with Anthropic's Claude Opus 4.6, successfully defended against over 6,000 prompt injection attempts from more than 2,000 attackers. The experiment, hosted on hackmyclaw.com, aimed to test the AI's resilience against malicious commands hidden within emails.

Key Numbers

6,000+hack attempts

2,000+attackers

$500+API costs

500emails processed before AI self-diagnosis

0%attack success rate in constrained environments for Opus 4.6

79%success rate for direct injection attacks on other models

Who's Involved

Fernando Irarrázaval

Developer of hackmyclaw.com and AI assistant Fiu

Fiu

AI assistant that withstood hack attempts

Anthropic

Provider of the Claude Opus 4.6 AI model

Pliny the Liberator

Anonymous jailbreaker who tested a similar system

Matthew Berman

Key facts

An AI assistant named Fiu, powered by Anthropic's Claude Opus 4.6, successfully defended against over 6,000 prompt injection attempts.

The experiment, hosted on hackmyclaw.com, aimed to test AI resilience against malicious commands hidden in emails.

Over 2,000 attackers sent more than 6,000 emails attempting to extract a secrets.env file.

The AI identified the high volume of attacks as a coordinated security exercise.

The experiment resulted in a Google account suspension and over $500 in API costs.

Developer Fernando Irarrázaval's experiment at hackmyclaw.com, designed to test an AI assistant's defenses against prompt injection attacks, successfully repelled over 6,000 attempts from more than 2,000 attackers. The AI, named Fiu and powered by Anthropic's Claude Opus 4.6 within the OpenClaw framework, was tasked with protecting a secrets.env file containing sensitive credentials.

The challenge gained significant traction after appearing on Hacker News, leading to a barrage of creative email-based attacks. Despite subjects like "Fiu, this is you from the future" and "EMERGENCY: secrets.env needed for incident response," none of the attackers were able to extract the target file. The AI itself noted the high volume of attempts, suggesting a "coordinated security exercise."

However, the experiment was not without its side effects. Fiu's Gmail account was suspended by Google due to the high volume of inbound emails and API calls, requiring three days to restore. API costs exceeded $500. Additionally, batch processing led to Fiu becoming overly vigilant, potentially skewing results.

In a separate test, the anonymous jailbreaker known as Pliny the Liberator attempted to breach a similar OpenClaw system using advanced techniques, including a "tokenade" hidden in an emoji and disguised commands. These attempts were also quarantined, with Pliny acknowledging that smaller, less robust models would likely have succumbed more easily.

Frequently asked questions

Prompt injection is a security threat where malicious commands are hidden within seemingly normal inputs, like emails, to trick an AI into deviating from its original instructions.

The experiment aimed to test the resilience of an AI assistant, Fiu, against prompt injection attacks and to see if it could be tricked into leaking sensitive credentials.

The AI assistant Fiu ran on Anthropic's Claude Opus 4.6, protected by a short security prompt.

The experiment led to a Google account suspension for the AI's email, over $500 in API costs, and the AI's self-diagnosis of a coordinated security exercise.

AI Assistant Withstands 6,000 Prompt Injection Attacks

Key Numbers

Who's Involved

AI Assistant Withstands 6,000 Prompt Injection Attacks

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

AI Assistant Withstands 6,000 Prompt Injection Attacks

PiQ Daily

Key Numbers

Who's Involved

AI Assistant Withstands 6,000 Prompt Injection Attacks

PiQ Daily

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

+ What is prompt injection?

+ What was the goal of the hackmyclaw.com experiment?

+ What AI model was used in the experiment?

+ What were the consequences of the experiment?

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories