Tips and tricks to reduce MCP token bloat
My
latest for The New Stack shares techniques on optimizing MCP usage.
MCP servers can quickly drain context windows without the right guardrails. Thankfully, there are ways around this...
Today,
my feature with
The New Stack breaks down a number of practical techniques for reducing MCP token bloat as teams begin using multiple MCPs in real, scaled workflows.
Techniques include more intentional tool design, minimizing upfront context, progressive disclosure, better tool discovery, subagents, code mode, semantic caching, stronger prompting practices, and more.
The big takeaway: as MCP gains real enterprise traction, it'll take smart approaches to optimize its use in software development.
Huge thank you to the experts who shared their knowledge with me for this piece! This one features, in order of appearance:
-
Gil Feig, CTO and co-founder,
Merge
-
Christian Posta, VP and global field CTO,
solo.io
-
Alex Salazar, co-founder and CEO,
Arcade.dev
-
Marcin Klimek, senior technical product manager,
SmartBear
-
Kevin Swiber, API strategist,
Layered System
-
Neeraj Abhyankar, VP of data and AI,
R Systems
-
Ori Yitzhaki, chief product officer,
Sonar
-
Tom Moor, head of engineering,
Linear
-
Matt Martin, co-founder and CEO,
Clockwise
-
Ankit Jain, CEO,
Aviator
-
Melissa R., Director of AI,
AppOmni
This is a space I expect will continue to evolve, and I hope to continue covering the emerging techniques to get the most of MCP in practice.











