Anthropic's newly released AI model, Fable, intended for cybersecurity applications, has drawn criticism from researchers due to its stringent safety guardrails. Cybersecurity professionals report that Fable frequently rejects prompts that are only tangentially related to cybersecurity or even biology, hindering its practical use for security tasks.
Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force, stated that Fable rejects any request that could be "tangentially cyber related," including innocuous tasks like reading a blog post. When its safety measures are triggered, Fable displays a message indicating that the message was flagged for cybersecurity or biology topics. These restrictions are Anthropic's attempt to mitigate the risk of the model being used to develop malware or compromise software, a concern that also led to similar restrictions on its more powerful model, Mythos.
Cybersecurity veteran Matt Suiche noted that Fable's keyword-based system can misinterpret requests for secure coding as cybersecurity work, leading to downgrades. If Fable's guardrails are hit, it defaults to Claude Opus 4.8. Suiche acknowledged that these early-stage guardrails are understandable and likely to evolve, suggesting it's better to err on the side of caution during initial releases. Another researcher shared on X that even code reviews trigger Fable's restrictions.
Anthropic has not yet responded to requests for comment. For cybersecurity professionals seeking fewer limitations, Anthropic offers a Cyber Verification Program, similar to OpenAI's Trusted Access for Cyber program, which requires an application process for approved users.