HomeEverything
Equities & FundsCrypto & Digital AssetsAI & TechnologyBusiness & CorporateUS Politics & PolicyGeopolitics & Global RiskMacro, Rates & FXCommodities & EnergyEuropean Politics & MarketsAsia-PacificReal Estate & Property
← All Stories

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

Created at 29 Jun · 9:30 PM1 source↑ Market-relevant
IN SHORT

DeepReinforce has released Ornith-1.0, a family of open-source large language models specifically designed for AI coding agents. The models, available in various sizes up to 397 billion parameters, demonstrate strong performance on coding benchmarks, outperforming some larger models in specific agentic tasks.

✉Newsletter

PiQ Daily

Pick your topics. Get only what matters, on your cadence.

Key Numbers

9BOrnith-1.0 parameter size
31BOrnith-1.0 parameter size
35BOrnith-1.0 MoE parameter size
397BOrnith-1.0 flagship MoE parameter size
69.49B model SWE-bench Verified score
52.0Gemma 4-31B SWE-bench Verified score
82.4397B model SWE-bench Verified score
80.8Claude Opus 4.7 SWE-bench Verified score
80.6DeepSeek-V4-Pro SWE-bench Verified score
77.5397B model Terminal Bench 2.1 score
70.3Claude Opus 4.7 Terminal Bench 2.1 score
62.2397B model SWE-bench Pro score

Who's Involved

DeepReinforce
AI research lab that released Ornith-1.0
Ornith
Family of open-source LLMs for agentic coding
Google
Developer of Gemma models
Claude Opus 4.7
Anthropic's flagship model
DeepSeek-V4-Pro
A competing AI model
Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

↳ Why This Matters

Ornith-1.0 represents a significant advancement in open-source AI coding agents, offering specialized tools for developers building autonomous coding infrastructure. Its strong performance on agentic coding benchmarks suggests a shift towards more capable, unsupervised AI in software development workflows.

Key facts

  • DeepReinforce released Ornith-1.0, a family of open-source LLMs for AI coding agents.
  • Models range from 9 billion to 397 billion parameters and are MIT licensed.
  • The 397B model achieved 82.4 on SWE-bench Verified, outperforming Claude Opus 4.7.
  • The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.
  • Ornith-1.0 is optimized for agentic coding tasks and may underperform on general conversations.

DeepReinforce has released Ornith-1.0, a new family of open-source large language models specifically engineered for AI coding agents. Available in four sizes, ranging from 9 billion to a flagship 397 billion parameters, these models are designed to operate within real terminal and repository environments, performing tasks autonomously without constant human guidance.

The Ornith models are built with an 'agentic' approach, meaning they are trained to take actions and complete multi-step coding tasks, such as fixing bugs and refining code, by developing their own strategies. This contrasts with traditional conversational AI models. DeepReinforce emphasizes that Ornith-1.0 is not intended for general-purpose AI conversations or tasks like document summarization, as its performance may be suboptimal outside of developer pipelines.

Performance metrics highlight Ornith-1.0's capabilities. The 397 billion parameter model achieved a score of 82.4 on the SWE-bench Verified benchmark, surpassing notable models like Claude Opus 4.7 (80.8) and DeepSeek-V4-Pro (80.6). On the Terminal Bench 2.1, the 397B model scored 77.5, compared to Claude Opus 4.7's 70.3. Even the smaller 9 billion parameter model demonstrated strong performance, scoring 69.4 on SWE-bench Verified, which is competitive with larger models like Qwen 3.5-35B and significantly higher than Google's Gemma 4-31B (52.0).

DeepReinforce has implemented defenses against reward hacking, a potential issue with self-improving models. These include immutable environments, deterministic monitors, and a frozen judge model to ensure the AI's actions are genuine and not exploiting the training process. While Ornith-1.0 shows impressive results on coding-specific benchmarks, the company notes that Anthropic's latest flagship, Claude Opus 4.8, scores higher, and the primary competitive advantage lies within the open-source category for comparable parameter counts on agentic coding tasks.

Frequently asked questions

Ornith-1.0 is a family of open-source large language models developed by DeepReinforce, specifically designed for AI coding agents.

Ornith-1.0 is optimized for 'agentic' tasks, meaning it can autonomously perform multi-step coding operations within repositories and terminals, rather than focusing on general conversational abilities.

The models are available in 9 billion, 31 billion, 35 billion mixture of experts (MoE), and 397 billion MoE parameter sizes.

SWE-bench Verified is a benchmark that tests an AI's ability to fix real bugs from open-source GitHub repositories without seeing the test suite, scoring the percentage of issues resolved.

What Happens Next

01Further evaluation of Ornith-1.0 models on various coding benchmarks.
02Adoption of Ornith-1.0 by developers building agentic infrastructure.

Get the newsletter.

Pick the topics you actually care about. We'll email when there's news worth your time, on the cadence you choose. Cancel any time from your account.

Cadence

How It Developed

DeepReinforce released Ornith-1.0, a family of open-source coding models.
Ornith-1.0 models are available in four sizes: 9B, 31B, 35B MoE, and 397B MoE.
The models are licensed under MIT with no regional restrictions.
Ornith-1.0 is designed for agentic coding tasks, operating in terminal and repository environments.
The 397B model achieved 82.4 on SWE-bench Verified, surpassing Claude Opus 4.7 and DeepSeek-V4-Pro.
The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.
Ornith-1.0 models may underperform on non-coding tasks.
The models employ a novel reinforcement learning approach where the strategy for approaching tasks co-evolves with the policy.

Sources

T1
Ornith Is the Open-Source Coding Model Built for Agents, Not HumansDecrypt

Related Stories

Tavant Launches Agentic AI Platform for Software Engineering
29 Jun · 7:15 PM
Cursor launches mobile app for AI coding agents
29 Jun · 5:20 PM
Chamath Palihapitiya raises $135M for AI coding startup 8090 Labs, takes CEO role
29 Jun · 9:25 PM
Meta AI research chief: AI agents providing economic value are the next frontier
29 Jun · 3:05 AM
China's Qihoo 360 Founder Unveils 'Tulong Feng' AI Vulnerability Agent
29 Jun · 8:20 PM