AI at the Crossroads (Part 2): Rising AI Costs and the Push to the Edge
This is Part 2 of my AI at the Crossroads series. If you haven’t read Part 1, you can find it here: AI at the Crossroads (Part 1): Why the…
This is Part 2 of my AI at the Crossroads series. If you haven’t read Part 1, you can find it here: AI at the Crossroads (Part 1): Why the Edge Matters More Than Ever.
The Glow of the Cloud is Fading
In Part 1, I looked back at how the cloud turned from a scrappy side project for big tech into the backbone of modern AI. We traced how it lowered barriers, opened doors, and enabled almost anyone to access serious computing power. But good things tend to get complicated, and lately, I’ve been feeling the complications firsthand.
Last weekend, I was building an app using Lovable and Cursor. What started as a simple side project quickly became expensive. Every prompt, whether fixing a build error introduced by the AI or testing a new feature, burned tokens. By Sunday night, I had blown through my entire monthly allocation of credits. It changes your mindset. Instead of trying ideas freely, you start running mental cost-benefit checks before you even type. I’ve seen others online doing the same, holding back on experiments because credits run out before ideas do.
When the Rules Change Overnight
The launch of GPT-5 was hyped as a significant step forward. Instead, it turned into a real-time case study in the risks of renting AI in the cloud. Many users logged in to find their “legacy” models gone without warning. Features and workflows that had been running smoothly were now broken. Some processes needed reworking to adapt to the new model. Demand spikes slowed responses, and features people relied on were suddenly missing.
That is the reality of cloud AI. You are not just renting compute and services. You are also renting the rules of engagement. When your provider changes those rules, you have to adapt on their timeline, not yours.
The Economics No One Can Ignore
Zooming out from weekend projects, this is the same problem companies are hitting at scale. Token-based pricing, data transfer fees, inference charges — they add up fast. SaaS AI companies are running at a loss. Enterprises rolling out AI assistants to thousands of employees watch budgets spiral. IDC’s latest survey found that more than half of organizations went over their cloud AI budgets last year. Gartner has warned that by 2026, 40 percent of large enterprises will have explicit AI budget caps in place because of cost overruns.
The early promise of the cloud was that it was cheaper and more flexible than owning hardware. For AI workloads today, that is no longer guaranteed.
The Physical and Regulatory Limits
Cost is not the only constraint. Power and water demands for data centers are now political issues. GPU shortages delay projects. Some cities and regions pushed back and slowed or paused hyperscale builds due to the impact on surrounding communities. Laws governing where you can store and process data are tightening, particularly for industries such as healthcare and finance.
As Kate Crawford has argued in her interviews and her book Atlas of AI, the rise of large cloud-based AI systems demands more water, energy, and land, posing a new strain on public resources. Moving some of that intelligence to the edge, she says, might not only shrink those environmental footprints but also increase data sovereignty for communities.
When Centralization Bites Back
Centralized AI worked fine when models were smaller and user expectations looser. Today, industries with strict latency or privacy requirements, such as healthcare, manufacturing, and legal services, are running into hard limits. A hospital AI assistant that needs to pull patient data from the cloud can grind to a halt during a network outage. A factory vision system that depends on a cloud model can miss defects if latency spikes. In legal services, a model used to review case documents or generate filings could be delayed or disrupted if a provider changes pricing, retires a model, or experiences downtime in a critical jurisdiction.
A single surprise, whether it is a price change, a retired model, or a regional outage, can ripple through these systems in seconds. When mission-critical AI services live entirely in a 3rd parties environment, your uptime and performance are bound to their infrastructure, their priorities, and their timelines, THEIR SECURITY INCIDENTS…not your own.
The Case for the Edge
Edge AI is not a perfect fit for every workload. You won’t be training GPT-4 on your iPad anytime soon. But for day-to-day inference, fine-tuned models, and latency-sensitive applications, local compute is becoming not just viable, but often the better choice. On-device AI for software development is already proving itself. Developers are running free Qwen3 Coder Flash models alongside free tools like VSCode, coding entirely offline with no internet connection, no cloud storage, and no monthly subscription.
You do not need a cloud API, ongoing fees, or even a persistent connection to get meaningful value from AI in everyday work. Tasks like writing and debugging code, summarizing documents, or running specialized assistants are already practical on local hardware. The limitations for training truly massive models are real, but for most users and an increasing share of enterprise and creative workloads, on-device tools are already powerful and ready. As hardware accelerates and more models are optimized for local execution, this trend is expected to continue growing.
Looking Ahead
The hard-tech era we are stepping into is not just about faster chips. It is about control — over your data, your costs, and your ability to run what you need, when you need it.
The open question is whether the operational costs of running AI in the cloud will eventually be outweighed by the efficiency and productivity gains it delivers. When will we see a point where revenue grows faster than AI costs are rising? That answer will determine how much of the future stays in the cloud and how much moves to the edge.
In Part 3, I will delve into how edge AI is being implemented, what is working, what still needs refinement, and where the hype lines up with reality.