How semantic caching reduces LLM API calls

Bill Doerrfeld | May 5, 2025

My latest for The New Stack explores semantic caching, an emerging strategy to optimize agentic AI.


Semantic caching is like typical caching, but for AI. It could eliminate a lot of redundant API calls to LLMs, reducing costs and improving performance.


My latest for The New Stack explores semantic caching — what it is, how it works, and what the benefits are. According to the sources, semantic caching is poised to become more of a standard practice for optimizing how applications behave with AI, reducing latency and lowering the bar as costs increase.


Featured image credit: Donald Wu


Read: What Is Semantic Caching?
By Bill Doerrfeld February 18, 2026
My latest on InfoWorld reviews MCP servers from 5 major cloud providers
By Bill Doerrfeld February 18, 2026
How are organizations actually using agentic knowledge bases in practice? My article for The New Stack looks at six emerging patterns.
eBPF in Production Report
By Bill Doerrfeld February 12, 2026
My report for the eBPF Foundation explores enterprise eBPF case studies, production deployments, and real business outcomes across cloud-native environments.
Close-up of whole bean coffee Bottomless
By Bill Doerrfeld February 10, 2026
Longtime Bottomless user sharing why I love automated coffee delivery triggered by a smart scale, plus a referral link for a free first bag.
By Bill Doerrfeld February 5, 2026
MCP servers can quickly drain context windows without guardrails. Thankfully, there are ways around this, say the experts.
By Bill Doerrfeld February 4, 2026
It may seem like AI agents are suddenly doing everything across industries. But in reality, the pace of agentic AI is moving carefully, and very deliberately, in highly regulated environments like finance and banking.
By Bill Doerrfeld February 3, 2026
My latest feature for InfoWorld explores when it makes sense to scrape public web sources, and when official API integrations are the better choice for external data.
By Bill Doerrfeld January 30, 2026
What does it mean to go nano with your software updates — to "carve with a scalpel" instead of swinging a hammer? For my latest DirectorPlus piece, I caught up with Chainguard VP Dustin Kirkland to dig into that idea.
By Bill Doerrfeld January 27, 2026
I recently moderated a webinar that brought together three luminaries in the API community to discuss the importance of API standards in agentic AI development.
By Bill Doerrfeld January 26, 2026
The more folks use MCP servers in development, the more they’re realizing it can lead to runaway token usage, unpredictable response sizes, and flooded context windows.