How semantic caching reduces LLM API calls

Bill Doerrfeld | May 5, 2025

My latest for The New Stack explores semantic caching, an emerging strategy to optimize agentic AI.


Semantic caching is like typical caching, but for AI. It could eliminate a lot of redundant API calls to LLMs, reducing costs and improving performance.


My latest for The New Stack explores semantic caching — what it is, how it works, and what the benefits are. According to the sources, semantic caching is poised to become more of a standard practice for optimizing how applications behave with AI, reducing latency and lowering the bar as costs increase.


Featured image credit: Donald Wu


Read: What Is Semantic Caching?
Man presenting at the Nordic APIs conference, standing in front of a screen, with audience.
By Bill Doerrfeld September 17, 2025
Join me in Stockholm for Platform Summit 2025 and the API Security UnConference, October 13–15. Exciting talks, networking, and more.
A grey articulated figure kneeling, arranging small white objects in a clear plastic container. White background.
By Bill Doerrfeld September 11, 2025
MCP shines for indeterministic workflows, novel integrations, and giving AI coding agents context on the fly. But for more predictable automation it may be overengineeering.
Overhead view of construction site with workers in orange vests, metal beams, and dark concrete.
By Bill Doerrfeld August 30, 2025
For my latest DirectorPlus column with LeadDev, I synced with JB Brown, VP of engineering at Smartsheet, to learn about their multi-agent AI development strategy.
Pink and purple sunset sky with dark, fluffy clouds.
By Bill Doerrfeld August 25, 2025
Alternative clouds are having a moment. Nearly 75% of orgs are using two or more alt clouds beyond the hyperscalers, according to a HostingAdvice.com report.
Digital global CIOs digital sovereignty
By Bill Doerrfeld August 20, 2025
The cloud is no longer borderless. Rising regional data laws and sovereign cloud mandates are forcing CIOs to act.
A
By Bill Doerrfeld August 11, 2025
In a multi-agent coding workflow, an engineer leads a "team" of specialist AI agents to perform various SDLC tasks: scaffolding, coding, testing, log analysis, deployment, and more.
Open source software churn end of life
By Bill Doerrfeld August 8, 2025
Open-source software churn is accelerating. With more frequent version end-of-lives and even total project abandonments, it's harder than ever to keep up.
Hype drives most programming language igrationsigra
By Bill Doerrfeld July 30, 2025
I covered a report from HostingAdvice.com, which found that the majority of programming language migrations are driven by hype, instead of proven outcomes.
Cross-functional teams help Stack Overflow adapt LeadDev DirectorPlus 2025
By Bill Doerrfeld July 28, 2025
Facing an existential crisis, Stack Overflow has had to pivot quickly. I synced with a director to discover what team strategies are helping them adapt.
System Initiative feature InfoWorld doerrfeld
By Bill Doerrfeld July 14, 2025
System Initiative aims to replace the toil of maintaining config files with a data-based digital twin and visual modeling engine. An engine for DevOps, if you will.