How semantic caching reduces LLM API calls

Bill Doerrfeld | May 5, 2025

My latest for The New Stack explores semantic caching, an emerging strategy to optimize agentic AI.


Semantic caching is like typical caching, but for AI. It could eliminate a lot of redundant API calls to LLMs, reducing costs and improving performance.


My latest for The New Stack explores semantic caching — what it is, how it works, and what the benefits are. According to the sources, semantic caching is poised to become more of a standard practice for optimizing how applications behave with AI, reducing latency and lowering the bar as costs increase.


Featured image credit: Donald Wu


Read: What Is Semantic Caching?
Cross-functional teams help Stack Overflow adapt LeadDev DirectorPlus 2025
By Bill Doerrfeld July 28, 2025
Facing an existential crisis, Stack Overflow has had to pivot quickly. I synced with a director to discover what team strategies are helping them adapt.
System Initiative feature InfoWorld doerrfeld
By Bill Doerrfeld July 14, 2025
System Initiative aims to replace the toil of maintaining config files with a data-based digital twin and visual modeling engine. An engine for DevOps, if you will.
CIOs describe why AI agents need APIs
By Bill Doerrfeld July 10, 2025
My latest feature on CIO.com explores why CIOs view APIs as a critical linchpin to realize success with agentic AI. Learn what it'll all take.
AI tooling directorplus doerrfeld one year
By Bill Doerrfeld June 30, 2025
A year into the DirectorPlus newsletter, I check back in with past guests on how their organizations are approaching AI tooling strategies.
How to make APIs ready for AI agents to consume
By Bill Doerrfeld June 25, 2025
How do you make an API ready for AI agents to use? I posed this question to a handful of API experts and put together a comprehensive guide for The New Stack — published today.
Senior developers embarrassed tech stack leaddev doerrfeld storyblok report
By Bill Doerrfeld June 17, 2025
86% of developers are embarrassed by their tech stack. And, it's causing them to quit. I look at the implications of a report from Storyblok.
Comparing 6 multicloud management platforms Doerrfeld InfoWorld
By Bill Doerrfeld June 16, 2025
The majority of enterprises are now multicloud. I compared six of the leading multicloud management solutions for InfoWorld.
Large action models LAMs story Bill Doerrfeld The New Stack
By Bill Doerrfeld June 10, 2025
AI researchers are calling the next class of models large action models (LAMs). For The New Stack, I explored what LAMs are, what examples are emerging in the market, and what experts think.
7 proven AI prompting strategies for coding to try today
By Bill Doerrfeld June 9, 2025
My article for LeadDev explores specific prompting techniques proven to sharpen your AI-assisted software development.
Nordic APIs ranked #1 API blog on the web
By Bill Doerrfeld June 7, 2025
Nordic APIs, the API-specific blog I edit, was recently ranked the top API blog online by FeedSpot. After ten years managing this presence, I reflect a bit on the journey thus far.