How semantic caching reduces LLM API calls

Bill Doerrfeld | May 5, 2025

My latest for The New Stack explores semantic caching, an emerging strategy to optimize agentic AI.


Semantic caching is like typical caching, but for AI. It could eliminate a lot of redundant API calls to LLMs, reducing costs and improving performance.


My latest for The New Stack explores semantic caching — what it is, how it works, and what the benefits are. According to the sources, semantic caching is poised to become more of a standard practice for optimizing how applications behave with AI, reducing latency and lowering the bar as costs increase.


Featured image credit: Donald Wu


Read: What Is Semantic Caching?

Other Blog Posts

By Bill Doerrfeld March 31, 2026
My latest InfoWorld feature explores what makes an enterprise MCP registry effective, from semantic discovery to governance and security for AI agents.
By Bill Doerrfeld March 30, 2026
My first-ever contribution to CSO Online looks at the shifting landscape, from perimeter-based security to API security, and how CISOs are responding.
By Bill Doerrfeld March 29, 2026
My latest feature for The New Stack looks into solutions being proposed to fix open source Slopmageddon.
A digital pattern of rounded rectangular blocks in shades of blue and purple, arranged in an interlocking layout.
By Bill Doerrfeld March 27, 2026
My latest DirectorPlus looks at how agentic AI is reshaping platform engineering at Squarespace: less shared code and more developer experience focus.
By Bill Doerrfeld March 19, 2026
Usage-based pricing is reshaping the API economy. Discover 5 API monetization success stories, including OpenAI, Plaid, and AssemblyAI.
A lightbulb against a purple background, containing a human brain with an
By Bill Doerrfeld March 18, 2026
Why event-driven APIs matter for AI workflows, enabling real-time data, scalable systems, and responsive agent behavior.
By Bill Doerrfeld February 28, 2026
While hardware usually gets the spotlight in physical AI, the real differentiator won't be hardware. It'll be the models.
By Bill Doerrfeld February 27, 2026
In the latest DirectorPlus, Workato's CTO explains how MCP-enabled integration catalyzed internal AI usage and ROI.
By Bill Doerrfeld February 18, 2026
My latest on InfoWorld reviews MCP servers from 5 major cloud providers
By Bill Doerrfeld February 18, 2026
How are organizations actually using agentic knowledge bases in practice? My article for The New Stack looks at six emerging patterns.