AI token costs soar, forcing companies to seek cheaper models

Created at 16 Jun · 12:46 AM1 source↑ Market-relevant

IN SHORT

Indian companies are grappling with soaring AI costs, driven by high token consumption for advanced language models. Businesses are exploring open-source alternatives and specialized startups to optimize usage and reduce expenses, with some achieving significant savings.

Key Numbers

400 billiontokens consumed monthly by a large enterprise

$78,000monthly AI expense for 400 billion tokens

$1 millionannual AI expense for 400 billion tokens

80%tasks solvable by models consuming fewer tokens

$125 billionestimated global inference market value in 2025

50-70%savings in generative AI costs using model routing

3xreduction in tokens seen by Nurix AI using Pipeshift's tech

$25per million output tokens for Claude Opus 4.8

$5per million output tokens for Claude Haiku 3.5

$30per million output tokens for ChatGPT 5.5

$15per million output tokens for ChatGPT 5.4

$180per million output tokens for Pro versions of advanced models

$1.5per million output tokens for GPT3.5

$0.28-$0.87per million output tokens for DeepSeek's latest model

Who's Involved

↳ Why This Matters

The escalating costs of AI token consumption are a critical concern for businesses, impacting profitability and the feasibility of widespread AI adoption. The development of cost-optimization strategies and the rise of specialized inference solutions are crucial for enabling sustainable AI integration across industries.

Key facts

Companies are facing significant AI costs due to high token consumption, with one enterprise spending $78,000 monthly.

The primary driver of these costs is unoptimized AI workloads and the use of advanced, token-hungry models.

Businesses are turning to open-source models, smaller language models, and specialized startups for cost optimization.

Startups offer solutions like inference optimization and model routing, claiming savings of 50-70%.

The global inference market is estimated to reach $125 billion by 2025.

Enterprises are finding it challenging to navigate market hype and identify effective AI solutions.

The escalating costs associated with artificial intelligence, particularly the consumption of tokens by large language models, are forcing enterprises to seek more economical solutions. A significant challenge for companies is the high expense incurred from using advanced AI models, with one large enterprise reportedly spending $78,000 per month for 400 billion tokens, equating to $1 million annually. This staggering expenditure is prompting a strategic shift towards optimizing AI workloads and exploring alternative models.

Industry executives highlight that unoptimized AI usage and the need for the latest, most token-hungry models are the primary culprits behind these soaring costs. To combat this, companies are increasingly opting for open-source models and smaller language models that are less resource-intensive. Startups like Pipeshift and Divyam.ai are emerging to address this demand by offering solutions focused on inference optimization, GPU orchestration, and intelligent model routing.

These specialized firms help enterprises manage their AI ecosystems by directing queries to the most appropriate model, rather than exclusively relying on cutting-edge, expensive frontier models. Experts suggest that up to 80% of tasks can be effectively handled by models that consume fewer tokens, indicating a strong demand for rightsizing AI model usage for specific tasks. This has fueled the growth of the inference market, which is conservatively estimated to reach $125 billion globally by 2025.

The cost premium associated with model sophistication is evident in the pricing of leading AI models. For instance, Claude Opus 4.8 and ChatGPT 5.5 have significantly higher per-token costs compared to their predecessors or less advanced models like DeepSeek. Companies like Divyam.ai claim to achieve substantial cost savings, ranging from 50% to 70%, for their clients through effective model routing and inferencing strategies. Pipeshift, in partnership with GPU provider Neysa Networks, is also deploying open-source models to reduce latency and costs, with one client reporting a threefold reduction in token usage.

Beyond model selection, some companies are exploring alternative hardware and model architectures. MakeMyTrip, for example, is focusing on small language models that can run on CPUs for use cases requiring low latency but not complex reasoning. However, navigating the AI market presents challenges, as enterprises struggle to discern genuine solutions from market hype and 'AI washing.' There is a recognized need to educate businesses, particularly in markets like India, about the viability of open-source models for many applications, as opposed to solely relying on frontier labs.

Frequently asked questions

High AI costs are primarily driven by the consumption of tokens by advanced language models and unoptimized AI workloads.

Companies are opting for open-source models, smaller language models, and specialized startups offering inference optimization and model routing solutions.

The global inference market is conservatively estimated to be worth about $125 billion in 2025.

Some firms report achieving 50-70% savings in generative AI costs through model routing and inferencing.

AI token costs soar, forcing companies to seek cheaper models

Key Numbers

Who's Involved

AI token costs soar, forcing companies to seek cheaper models

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories

AI token costs soar, forcing companies to seek cheaper models

PiQ Daily

Key Numbers

Who's Involved

AI token costs soar, forcing companies to seek cheaper models

PiQ Daily

Key Numbers

Who's Involved

↳ Why This Matters

Key facts

Frequently asked questions

+ What is driving the high costs of AI?

+ What are companies doing to reduce AI costs?

+ What is the estimated size of the global inference market?

+ How much cost savings can be achieved through optimization?

What Happens Next

Get the newsletter.

How It Developed

Sources

Related Stories