Key facts
- Anthropic launched Claude Sonnet 5, a midsize AI model with enhanced agentic capabilities.
- The new model is designed to perform tasks like planning, using tools, and running autonomously at a lower cost than larger models.
- Sonnet 5 is priced at $2 per million input tokens and $10 per million output tokens until August 31, with a subsequent price increase.
- Performance benchmarks show Sonnet 5 nearing Opus 4.8's capabilities in areas like coding and knowledge work, while being more cost-effective.
- The model demonstrates improved safety features, including a lower rate of undesirable behaviors and better refusal of malicious requests.
Anthropic has released Claude Sonnet 5, positioning it as a more powerful and agentic version of its midsize language model. The company stated that the new model can plan, utilize tools like browsers and terminals, and operate autonomously, capabilities previously requiring larger and more expensive models.
This release aligns with trends from competitors like OpenAI and Google, who have also emphasized agentic features in their recent models. Sonnet 5's pricing is set at $2 per million input tokens and $10 per million output tokens until August 31, after which it will increase to $3 and $10 respectively. This makes it cheaper than Anthropic's Opus 4.8, OpenAI's GPT-5.5, and Google's Gemini 3.1 Pro, though still more expensive than Gemini 3.5 Flash.
Anthropic claims Sonnet 5 shows significant improvements over its predecessor, Sonnet 4.6, in agentic performance, including reasoning, tool use, coding, and knowledge work. On an agentic coding benchmark, Sonnet 5 scored 63.2%, compared to Opus 4.8's 69.2% and Sonnet 4.6's 58.1%. While Opus 4.8 remains the choice for tasks requiring the highest accuracy, Sonnet 5 offers developers a more cost-effective option with substantially higher quality than previously available.
Testers have noted Sonnet 5's ability to complete complex, end-to-end tasks autonomously, such as updating Salesforce accounts and sending announcements, without stalling. The model also demonstrates enhanced safety features, with a lower rate of undesirable behaviors, deception, and hallucinations compared to Sonnet 4.6. However, it is not on par with Opus 4.8 or Claude Mythos Preview for handling dangerous cybersecurity tasks.
