AI Models in the Enterprise:

Five Hard Truths

Enterprises aren’t debating if they’ll use AI models anymore—they’re wrestling with how to do it without blowing up budgets, governance, or credibility. This week’s series of posts I published on AI models (evolution, economics, size vs. quality, specialization vs. generalization, and multi-agent systems) can be stitched into one clear narrative: value shows up only when model choices align with operating realities—data, infra, risk, and economics.

1) From “predictive” to “agentic” — usage is up, material impact is uneven

Adoption has surged. MacKinsey’s latest survey shows 78% of organizations now use AI, and 71% report using generative AI in at least one function, up from 65% just months earlier. Use is concentrated in marketing & sales, product development, service operations, and software engineering. Yet most firms still don’t see enterprise-level EBIT impact—benefits remain localized to functions, and oversight of model outputs varies widely. Source: McKinsey & Company

Implication: If you’re still running isolated pilots, you’re competing against peers who are wiring models into workflows end-to-end (with monitoring, escalation, and cost controls). “Model strategy” is fast becoming “operating model” strategy.

2) The economics: spending is real; price/perf is shifting under your feet

IDC estimates organizations spent $235B on AI in 2024, rising to $630B+ by 2028; GenAI’s share is set to nearly double from ~17% to 32% over that horizon. That’s not hype; that’s budget gravity. Source: IDC Blog

At the same time, providers are attacking the unit economics. AWS claims Trainium2 delivers 30–40% better price-performance than current GPU-based instances, and even its CEO underscores that inference will dominate AI costs—so bringing inference prices down is existential for scale. Source: Amazon Web Services

Implication: Your model plan is your cost plan. Treat “build vs. fine-tune vs. rent” as a financial decision with sensitivity analyses for workload mix (training vs. inference), latency targets, and guardrail overhead—because those will decide the winners as much as accuracy does.

3) Size isn’t strategy: smaller models + better data are eating “bigger is better”

There’s a reason small language models (SLMs) are getting attention. Microsoft’s Phi-3 family shows smaller models outperform peers of the same or even next size up on a range of tasks, while enabling on-device and low-latency scenarios that are impractical with giant LLMs. Source: Microsoft Azure

Macro trends back this up: Stanford’s AI Index reports that inference costs for GPT-3.5-level performance fell ~280× (Nov 2022 → Oct 2024), with hardware costs dropping ~30%/yr and energy efficiency improving ~40%/yr. Open-weight models are also closing the gap with closed models. Source: Stanford HAI

Implication: Chasing parameter counts is indulgent. Competitive advantage comes from fit-for-purpose models with disciplined data curation, retrieval strategies, and MLOps—not from vanity scale.

4) General vs. specialized: stop arguing, build a portfolio

General-purpose frontier models are great for breadth, rapid prototyping, and cross-domain tasks. But domain-tuned or specialized models routinely win on accuracy, compliance, and latency for mission-critical workflows. Even Microsoft notes trade-offs—small models won’t retain as much factual knowledge as larger models, so pair them with retrieval and controls. Source: Microsoft Azure

Implication: Manage models like products. Standardize an AI portfolio:

a general model (for breadth and rapid iteration),
a small/efficient model (for cost-sensitive and on-device use), and
one or more specialized models (for regulated/high-risk workflows). Governance, observability, and cost allocation should be uniform across the portfolio.

5) Beyond single models: the system is the strategy (agents, orchestration, and reality)

The next step isn’t “a bigger model”—it’s systems of models. Anthropic documents the promise and pain of multi-agent setups: they can outperform single-agent systems on research-type tasks, but coordination, evaluation, and token burn are real challenges (multi-agent runs can consume ~15× chat tokens). Success depends on orchestration patterns, evals, and cost discipline—not magic. Source: Anthropic

Implication: If you move to agents, do it eyes-open:

Start simple (workflows before agents).
Instrument everything (token budgets, tool calls, failure modes).
Gate expensive loops with human-in-the-loop and clear stop conditions. Anthropic

How to operationalize this (a pragmatic checklist)

Tie model choices to unit economics. Build a cost model that separates training vs. inference, and simulates traffic growth, latency budgets, and guardrail overhead. Revisit quarterly as cloud price/perf shifts (e.g., Trainium2).
Adopt a model portfolio. Standardize APIs, eval harnesses, and governance across general, small, and specialized models; route tasks by complexity and risk rather than by hype.
Invest in data quality over sheer volume. Expect higher returns from retrieval, fine-tuning on curated corpora, and prompt/eval pipelines than from model size alone. The macro cost curves (280× inference drop) reward efficiency.
Measure value where it actually lands. Don’t wait for enterprise-level EBIT to move before you scale. Track functional KPIs (cycle time, resolution rate, defects) and expand from proven footholds. McKinsey’s data shows value shows up first in functions.
Pilot agents with discipline. Start with orchestrated workflows; introduce autonomy only when evals show a gain that justifies token costs. Instrument reliability, not just accuracy.

Bottom line

AI models are becoming a business capability, not a lab curiosity. Spending will keep climbing, but the price/performance frontier is moving fast—and the winners will be those who translate model choices into operating leverage: faster cycles, safer workflows, and defensible cost structures. If your AI roadmap still centers on a single model decision, you’re solving the wrong problem. Architect the system—the portfolio, the infra, the governance, and the economics—and the model debate will take care of itself.

Sources for further reading

McKinsey (2025): The state of AI—How organizations are rewiring to capture value (adoption up to 71% using genAI; value concentrated in functions). McKinsey & Company+1
IDC (2024): Global AI & GenAI Spending (AI $235B → $630B+ by 2028; GenAI share rising to 32%). IDC Blog
AWS (2024–2025): Trainium2 price/perf claims and CEO letter (30–40% better price-performance; inference as dominant cost). Amazon Web Services, Inc.About Amazon
Microsoft (2024): Introducing Phi-3 (SLMs outperform peers of same/next size; on-device scenarios). Microsoft Azure
Stanford HAI (2025): AI Index Report (280× drop in inference cost for GPT-3.5-level performance; efficiency trends; open-weight progress). Stanford HAI
Anthropic (2024–2025): Building effective agents and How we built our multi-agent research system (when agents help, and why orchestration/evals matter; token cost realities).

Originally posted on Linkedin by Rakesh Patni

Page updated

Report abuse