Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

Created at 29 Jun · 9:30 PM1 source↑ Market-relevant

IN SHORT

DeepReinforce has released Ornith-1.0, a family of open-source large language models specifically designed for AI coding agents. The models, available in various sizes up to 397 billion parameters, demonstrate strong performance on coding benchmarks, outperforming some larger models in specific agentic tasks.

Key Numbers

9BOrnith-1.0 parameter size

31BOrnith-1.0 parameter size

35BOrnith-1.0 MoE parameter size

397BOrnith-1.0 flagship MoE parameter size

69.49B model SWE-bench Verified score

52.0Gemma 4-31B SWE-bench Verified score

82.4397B model SWE-bench Verified score

80.8Claude Opus 4.7 SWE-bench Verified score

80.6DeepSeek-V4-Pro SWE-bench Verified score

77.5397B model Terminal Bench 2.1 score

70.3Claude Opus 4.7 Terminal Bench 2.1 score

62.2397B model SWE-bench Pro score

Who's Involved

DeepReinforce

AI research lab that released Ornith-1.0

Ornith

Family of open-source LLMs for agentic coding

Google

Developer of Gemma models

Claude Opus 4.7

Anthropic's flagship model

DeepSeek-V4-Pro

A competing AI model

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

↳ Why This Matters

Ornith-1.0 represents a significant advancement in open-source AI coding agents, offering specialized tools for developers building autonomous coding infrastructure. Its strong performance on agentic coding benchmarks suggests a shift towards more capable, unsupervised AI in software development workflows.

Key facts

DeepReinforce released Ornith-1.0, a family of open-source LLMs for AI coding agents.
Models range from 9 billion to 397 billion parameters and are MIT licensed.
The 397B model achieved 82.4 on SWE-bench Verified, outperforming Claude Opus 4.7.
The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.
Ornith-1.0 is optimized for agentic coding tasks and may underperform on general conversations.

DeepReinforce has released Ornith-1.0, a new family of open-source large language models specifically engineered for AI coding agents. Available in four sizes, ranging from 9 billion to a flagship 397 billion parameters, these models are designed to operate within real terminal and repository environments, performing tasks autonomously without constant human guidance.

The Ornith models are built with an 'agentic' approach, meaning they are trained to take actions and complete multi-step coding tasks, such as fixing bugs and refining code, by developing their own strategies. This contrasts with traditional conversational AI models. DeepReinforce emphasizes that Ornith-1.0 is not intended for general-purpose AI conversations or tasks like document summarization, as its performance may be suboptimal outside of developer pipelines.

Performance metrics highlight Ornith-1.0's capabilities. The 397 billion parameter model achieved a score of 82.4 on the SWE-bench Verified benchmark, surpassing notable models like Claude Opus 4.7 (80.8) and DeepSeek-V4-Pro (80.6). On the Terminal Bench 2.1, the 397B model scored 77.5, compared to Claude Opus 4.7's 70.3. Even the smaller 9 billion parameter model demonstrated strong performance, scoring 69.4 on SWE-bench Verified, which is competitive with larger models like Qwen 3.5-35B and significantly higher than Google's Gemma 4-31B (52.0).

DeepReinforce has implemented defenses against reward hacking, a potential issue with self-improving models. These include immutable environments, deterministic monitors, and a frozen judge model to ensure the AI's actions are genuine and not exploiting the training process. While Ornith-1.0 shows impressive results on coding-specific benchmarks, the company notes that Anthropic's latest flagship, Claude Opus 4.8, scores higher, and the primary competitive advantage lies within the open-source category for comparable parameter counts on agentic coding tasks.

Frequently asked questions

Ornith-1.0 is a family of open-source large language models developed by DeepReinforce, specifically designed for AI coding agents.

Ornith-1.0 is optimized for 'agentic' tasks, meaning it can autonomously perform multi-step coding operations within repositories and terminals, rather than focusing on general conversational abilities.

The models are available in 9 billion, 31 billion, 35 billion mixture of experts (MoE), and 397 billion MoE parameter sizes.

SWE-bench Verified is a benchmark that tests an AI's ability to fix real bugs from open-source GitHub repositories without seeing the test suite, scoring the percentage of issues resolved.

What Happens Next

01Further evaluation of Ornith-1.0 models on various coding benchmarks.

02Adoption of Ornith-1.0 by developers building agentic infrastructure.

Get the newsletter.

Pick the topics you actually care about. We'll email when there's news worth your time, on the cadence you choose. Cancel any time from your account.

Cadence

How It Developed

DeepReinforce released Ornith-1.0, a family of open-source coding models.

Ornith-1.0 models are available in four sizes: 9B, 31B, 35B MoE, and 397B MoE.

The models are licensed under MIT with no regional restrictions.

Ornith-1.0 is designed for agentic coding tasks, operating in terminal and repository environments.

The 397B model achieved 82.4 on SWE-bench Verified, surpassing Claude Opus 4.7 and DeepSeek-V4-Pro.

The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.

Ornith-1.0 models may underperform on non-coding tasks.

The models employ a novel reinforcement learning approach where the strategy for approaching tasks co-evolves with the policy.

Sources

Ornith Is the Open-Source Coding Model Built for Agents, Not HumansDecrypt

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

Created at 29 Jun · 9:30 PM1 source↑ Market-relevant

IN SHORT

Key Numbers

9BOrnith-1.0 parameter size

31BOrnith-1.0 parameter size

35BOrnith-1.0 MoE parameter size

397BOrnith-1.0 flagship MoE parameter size

69.49B model SWE-bench Verified score

52.0Gemma 4-31B SWE-bench Verified score

82.4397B model SWE-bench Verified score

80.8Claude Opus 4.7 SWE-bench Verified score

80.6DeepSeek-V4-Pro SWE-bench Verified score

77.5397B model Terminal Bench 2.1 score

70.3Claude Opus 4.7 Terminal Bench 2.1 score

62.2397B model SWE-bench Pro score

Who's Involved

DeepReinforce

AI research lab that released Ornith-1.0

Ornith

Family of open-source LLMs for agentic coding

Google

Developer of Gemma models

Claude Opus 4.7

Anthropic's flagship model

DeepSeek-V4-Pro

A competing AI model

↳ Why This Matters

Key facts

DeepReinforce released Ornith-1.0, a family of open-source LLMs for AI coding agents.
Models range from 9 billion to 397 billion parameters and are MIT licensed.
The 397B model achieved 82.4 on SWE-bench Verified, outperforming Claude Opus 4.7.
The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.
Ornith-1.0 is optimized for agentic coding tasks and may underperform on general conversations.

Frequently asked questions

Ornith-1.0 is a family of open-source large language models developed by DeepReinforce, specifically designed for AI coding agents.

The models are available in 9 billion, 31 billion, 35 billion mixture of experts (MoE), and 397 billion MoE parameter sizes.

SWE-bench Verified is a benchmark that tests an AI's ability to fix real bugs from open-source GitHub repositories without seeing the test suite, scoring the percentage of issues resolved.

What Happens Next

01Further evaluation of Ornith-1.0 models on various coding benchmarks.

02Adoption of Ornith-1.0 by developers building agentic infrastructure.

Get the newsletter.

Pick the topics you actually care about. We'll email when there's news worth your time, on the cadence you choose. Cancel any time from your account.

Cadence

How It Developed

DeepReinforce released Ornith-1.0, a family of open-source coding models.

Ornith-1.0 models are available in four sizes: 9B, 31B, 35B MoE, and 397B MoE.

The models are licensed under MIT with no regional restrictions.

Ornith-1.0 is designed for agentic coding tasks, operating in terminal and repository environments.

The 397B model achieved 82.4 on SWE-bench Verified, surpassing Claude Opus 4.7 and DeepSeek-V4-Pro.

The 9B model scored 69.4 on SWE-bench Verified, outperforming Gemma 4-31B.

Ornith-1.0 models may underperform on non-coding tasks.

The models employ a novel reinforcement learning approach where the strategy for approaching tasks co-evolves with the policy.

Sources

Ornith Is the Open-Source Coding Model Built for Agents, Not HumansDecrypt

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

PiQ Daily

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

+ What is Ornith-1.0?

+ What makes Ornith-1.0 different from other AI models?

+ What are the different sizes of Ornith-1.0 models?

+ What is SWE-bench Verified?

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

Ornith-1.0: Open-Source LLMs Built for AI Coding Agents

PiQ Daily

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

+ What is Ornith-1.0?

+ What makes Ornith-1.0 different from other AI models?

+ What are the different sizes of Ornith-1.0 models?

+ What is SWE-bench Verified?

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories