SMEs: How to Face the Token Shortage?

With the explosion of AI, especially the widespread deployment of AI agents in businesses, tokens are becoming increasingly scarce. (DirectIndustry)

by Camille Rustici

April 23, 2026Reading time: 10 minsUpdated on April 23, 2026

With the explosion of AI, especially the widespread deployment of AI agents in businesses, tokens are becoming increasingly scarce. Much like gasoline at service stations during the current Strait of Hormuz blockade, their availability is under strain. And just as with gasoline, the risks for businesses that depend on them are real. How can they adapt and prepare for shortages?

Help! Tokens Are Running Out on Planet AI

The Wall Street Journal recently reported that tokens are in short supply. Last month, Anthropic, the company behind Claude, began rationing the number of tokens available to users of Claude Code during peak hours on weekdays. Claude’s availability dropped to 98.32% (compared to its usual 99.99%). That may seem like a minor detail, but for businesses relying on it, the impact is significant.

A similar scenario unfolded at OpenAI, which had to shut down its video-generation tool, Sora, to conserve resources for other applications.

The result? Outages, throttled services that respond poorly—or not at all—and frustrated users. Complaints have flooded social media, with some reporting they hit their token limits on Claude Code in just 45 minutes.

What Is a Token, Again?

A token is the basic unit of resource consumption in AI, used by large language models (LLMs) to process and generate text. A token can be a word, part of a word, a punctuation mark, or even a space. The longer the text, the more tokens a chatbot needs to handle it.

With AI usage skyrocketing, demand for tokens is soaring, especially as AI agents, which write text, generate code, and automate workflows without human intervention, operate nonstop. Token consumption isn’t just growing; it’s relentless, like cars needing a constant flow of gasoline to keep moving.

According to the Wall Street Journal, OpenAI’s API processed 6 billion tokens per minute last October. By last month, that number had surged to 15 billion.

Data Centers, Chips, Electricity

Token demand is rising, but why should we fear a shortage? While tokens exist in cyberspace, their scarcity stems from very real physical constraints tied to AI infrastructure: the electronic chips that power computations, the data centers housing the essential GPU servers, and the electricity that keeps the entire ecosystem running.

Jules Herduin, a young consultant who helps SMEs deploy AI strategies, sees high-performance chips as a double-edged sword:

“Today, we’re etching chips at 2-nanometer scales, offering extraordinary capabilities. But this comes with physical limits, especially in data centers. Reading TSMC’s reports, their biggest challenge is generating enough energy and deploying data centers fast enough to meet the demand for computing power.”

It’s clear that AI’s hardware infrastructure isn’t keeping pace with its rapid deployment, whether in terms of users, applications, or features. Data centers can’t be built in a month. Semiconductors are facing shortages. And in the U.S., available energy for 2026 has already been allocated. Small modular reactors (SMRs) could help generate the missing electricity, but they’re not yet operational. Elon Musk’s idea of sending data centers into space remains just that, an idea. A potential deadlock looms, which explains the Wall Street Journal’s alarm.

The Sector’s Response: Price Hikes

To address the looming shortage, LLM operators like Anthropic and OpenAI are rationing access. (We can also expect free chatbot versions to degrade in quality.) Meanwhile, GPU rental prices are soaring. Two months ago, leasing a Nvidia Blackwell chip cost $2.75. Today, it’s $4.08, a nearly 50% increase.

Cloud providers are following suit. CoreWeave raised its prices by 20% and now requires minimum three-year commitments.

What Risks Do SMEs Face?

For SMEs relying on AI agents for 24/7 customer service or automated quoting, an outage risks delivering poor, or no service, potentially damaging their reputation. There’s also the fear of unexpected, skyrocketing costs, much like today’s volatile oil prices.

How to Respond?

Frugal Models

How can businesses mitigate these risks or prepare for them? Jules compares the token shortage to today’s oil crisis:

“With the situation in the Middle East, gasoline is scarce, prices are rising, and people are naturally cutting back on unnecessary trips. The same will happen with AI. People will rationalize their token usage.”

He steers the SMEs he advises toward lighter models, like those developed by France’s Mistral:

“I personally use Mistral Small 4 a lot. It’s excellent for businesses. It’s an extremely frugal model, inexpensive in terms of both tokens and energy. And it covers 90 to 95% of SMEs’ needs.”

For a 10-person SME, costs range from €50 to €150 per month, depending on infrastructure and query volume. In comparison, a solution like Copilot would cost around €300 to €400 per month.

Local AI

The question of AI frugality is gaining traction, driven by the energy costs of computation and efficiency challenges. Developers are now designing more specialized, customizable, and lightweight architectures that can run on less powerful machines.

A few months ago, we interviewed Olivier Debeugny, CEO of Dragon LLM, during the launch of their Dragon architecture (the startup was acquired by OVH last month). He explained that their goal was to offer a frugal alternative to dominant American Transformers, tailored for businesses, especially SMEs:

“The idea is to provide an architecture that lets companies run LLMs and generative AI locally, on their own servers, not necessarily GPUs, but standard machines with regular CPUs. This solves both energy and budget constraints for businesses adopting generative AI. Take banks, for example: they often tell us they need a model to analyze and respond to SWIFT messages. They don’t need a model that knows the entire history of Uruguay for that!”

During the same interview, Debeugny mentioned working on a 7-billion-parameter model (with only 1 billion active), capable of running on a smartphone in airplane mode.

Dual Sourcing

Before these frugal models become widely accessible, Jules advocates another solution for the SMEs he works with: dual sourcing.

“If you’re using OpenAI, Claude, or Microsoft, all American companies, the demand is so high that the system sometimes cuts out temporarily. If you depend on just one of these tools, you’re taking a risk.”

But in IT, he notes, you can switch AI providers.

“Imagine you’re using Claude by default, and tomorrow at 11 a.m., the service goes down. With a fallback (or failover) system, you automatically switch to another provider, let’s say Mistral. For SMEs, a properly installed AI system with redundancy means that if Provider A fails, you switch to Provider B.”

The goal is to avoid dependence on a single provider by diversifying sources to ensure service continuity. This approach, called dual sourcing, is already widely adopted in industry to diversify supply chains in a world marked by cascading crises.

“Businesses understand they shouldn’t rely on a single supplier for raw materials. The same goes for AI.”

For his clients, Jules deploys open-source tools like Open WebUI, an interface that connects multiple APIs, including Mistral’s and Scaleway’s, which offers Chinese models like Qwen (developed by Alibaba Cloud).

“These are Chinese models hosted in Europe. No data leaves for China, which is crucial for sovereignty. Ideally, I’d like to add a third provider, but options are limited if we want to stay in France.”

A Chance to Rethink AI?

Just as some see the Strait of Hormuz blockade as an opportunity to accelerate the energy transition, the token shortage might be our chance to avoid the “AI-for-everything” trap. Before deploying AI systematically, every business should ask: Is it truly relevant? Just as you wouldn’t drive a car to cross the street, there’s no need for AI where a simpler, less resource-intensive solution would suffice.