Key facts
- Claude Fable 5's performance has not degraded since its July 1 reinstatement.
- A new safety classifier is rerouting many coding and debugging tasks to Claude Opus 4.8.
- BridgeBench AI's scores dropped significantly because rerouted tasks were scored as zero.
- Arena.AI's blind human-preference votes showed Fable 5's performance remained largely consistent.
- Developers working in security-adjacent areas are most affected by the classifier's over-aggressiveness.
Concerns that Anthropic's Claude Fable 5 model was significantly degraded after its July 1 reinstatement have been largely attributed to an overzealous safety classifier rather than a decline in the model's capabilities. While benchmarks like BridgeBench AI showed drastic score drops in coding and debugging tasks, these results were skewed because the new classifier rerouted many prompts to Claude Opus 4.8, with BridgeBench scoring these fallbacks as zero.
Conversely, Arena.AI's blind human-preference tests, which rely on perceived quality rather than infrastructure routing, indicated that Fable 5's performance remained largely stable, with some categories even showing slight improvements. Users engaged in creative writing, document analysis, and expert text queries are unlikely to notice a difference.
However, developers, particularly those working in security-adjacent fields involving terms like 'vulnerability' or 'exploit,' are frequently hitting the classifier's fallback mechanism. Anthropic has acknowledged that the new classifiers are prone to false positives and will be refined over time, but has not provided a timeline for these improvements. The aggressive classifier was implemented to address a reported jailbreak technique that allowed Fable 5 to identify and demonstrate software vulnerabilities, which was deemed a national security threat.
