How semantic caching reduces LLM API calls

Bill Doerrfeld | May 5, 2025

My latest for The New Stack explores semantic caching, an emerging strategy to optimize agentic AI.

Semantic caching is like typical caching, but for AI. It could eliminate a lot of redundant API calls to LLMs, reducing costs and improving performance.

My latest for The New Stack explores semantic caching — what it is, how it works, and what the benefits are. According to the sources, semantic caching is poised to become more of a standard practice for optimizing how applications behave with AI, reducing latency and lowering the bar as costs increase.

Featured image credit: Donald Wu

Read: What Is Semantic Caching?

< Older Post

Newer Post >

Man presenting at the Nordic APIs conference, standing in front of a screen, with audience.

See you at Platform Summit 2025 in Stockholm

By Bill Doerrfeld • September 17, 2025

Join me in Stockholm for Platform Summit 2025 and the API Security UnConference, October 13–15. Exciting talks, networking, and more.

A grey articulated figure kneeling, arranging small white objects in a clear plastic container. White background.

When does MCP make sense?

By Bill Doerrfeld • September 11, 2025

MCP shines for indeterministic workflows, novel integrations, and giving AI coding agents context on the fly. But for more predictable automation it may be overengineeering.

Overhead view of construction site with workers in orange vests, metal beams, and dark concrete.

Finding business leverage with multiple agents

By Bill Doerrfeld • August 30, 2025

For my latest DirectorPlus column with LeadDev, I synced with JB Brown, VP of engineering at Smartsheet, to learn about their multi-agent AI development strategy.

Pink and purple sunset sky with dark, fluffy clouds.

Alternative clouds are having a moment

By Bill Doerrfeld • August 25, 2025

Alternative clouds are having a moment. Nearly 75% of orgs are using two or more alt clouds beyond the hyperscalers, according to a HostingAdvice.com report.

Digital sovereignty is now impossible to ignore

By Bill Doerrfeld • August 20, 2025

The cloud is no longer borderless. Rising regional data laws and sovereign cloud mandates are forcing CIOs to act.

Multi-agent AI workflows reshape the developer's day-to-day

By Bill Doerrfeld • August 11, 2025

In a multi-agent coding workflow, an engineer leads a "team" of specialist AI agents to perform various SDLC tasks: scaffolding, coding, testing, log analysis, deployment, and more.

When critical open source goes end of life...

By Bill Doerrfeld • August 8, 2025

Open-source software churn is accelerating. With more frequent version end-of-lives and even total project abandonments, it's harder than ever to keep up.

Hype drives most programming language igrationsigra

Most language migrations are driven by hype

By Bill Doerrfeld • July 30, 2025

I covered a report from HostingAdvice.com, which found that the majority of programming language migrations are driven by hype, instead of proven outcomes.

Cross-functional teams help Stack Overflow adapt LeadDev DirectorPlus 2025

Can Stack Overflow keep pace with AI disruption?

By Bill Doerrfeld • July 28, 2025

Facing an existential crisis, Stack Overflow has had to pivot quickly. I synced with a director to discover what team strategies are helping them adapt.

System Initiative feature InfoWorld doerrfeld

System Initiative: a radical overhaul of infrastructure automation

By Bill Doerrfeld • July 14, 2025

System Initiative aims to replace the toil of maintaining config files with a data-based digital twin and visual modeling engine. An engine for DevOps, if you will.