Should AI agents scrape or integrate external data?

Bill Doerrfeld | February 3, 2026

My latest on InfoWorld explores the upsides and caveats of both approaches.

To scrape or integrate? It's an age-old question resurfacing for AI agent builders.

Excited to share my analysis today for
InfoWorld, where I break down when it makes sense to scrape public web sources, and when official API integrations are the better choice for external data.

The takeaway: agents need data. New interactive browser tools and scraping techniques help pull in real-time, supplementary signals. But scraping comes with fragility and legal downsides. As
Deepak Singh puts it, "It's building on quicksand."

Scraping is no substitute for the predictable, validated, and governed integrations agents need to execute auditable workflows and real-world actions reliably.

This article features, in order of appearance:

-
Or Lenchner, CEO, Bright Data
-
Deepak Singh, CEO and co-founder, AvairAI Inc.
-
Neeraj Abhyankar, VP, Data and AI, R Systems
-
Gaurav Pathak, VP of AI and metadata, Informatica
-
Keith Pijanowski, AI and ML solutions engineer, MinIO
-
Krishna Subramanian, co-founder and COO, Komprise

Also shout-outs to reports from
PwC (2025 AI Agents Survey), Tray.ai (2024 Enterprise Survey), Salt Security (2025 AI Agents Report), and McKinsey & Company (2025 State of AI Study), plus links to reporting from AI21 Labs, The Register, and WIRED.

Read: How should AI agents consume external data?
By Bill Doerrfeld January 30, 2026
What does it mean to go nano with your software updates — to "carve with a scalpel" instead of swinging a hammer? For my latest DirectorPlus piece, I caught up with Chainguard VP Dustin Kirkland to dig into that idea.
By Bill Doerrfeld January 27, 2026
I recently moderated a webinar that brought together three luminaries in the API community to discuss the importance of API standards in agentic AI development.
By Bill Doerrfeld January 26, 2026
The more folks use MCP servers in development, the more they’re realizing it can lead to runaway token usage, unpredictable response sizes, and flooded context windows.
By Bill Doerrfeld January 20, 2026
Who really benefits from AI coding tools? New research suggests AI amplifies existing top performers more than average developers. Read my post on LeadDev.
By Bill Doerrfeld January 19, 2026
Many say edge computing will enable the future of AI inference. For InfoWorld, I looked at the tech required, and the roadblocks to overcome to get us there.
By Bill Doerrfeld January 15, 2026
Survey data from Zuplo finds rising MCP adoption, security concerns, and shows how developers are using MCP servers to connect AI with APIs in 2026.
By Bill Doerrfeld January 5, 2026
Blockchain for everything, metaverse, big data, NFTs... In hindsight, what were we thinking? Today, I call out some of tech's biggest overhyped trends on InfoWorld.
By Bill Doerrfeld January 5, 2026
Like any production software application, AI agents are producing a spectrum of metadata behind the scenes. Some are calling agentic metadata a “gold mine” to direct continual improvements.
By Bill Doerrfeld December 19, 2025
My latest DirectorPlus column with LeadDev interviews Bedrock Robotics' CTO, Kevin Peterson, on what it takes to develop highly adaptable and safe autonomous machines.
By Bill Doerrfeld December 17, 2025
I explore some tips to help speakers craft solid pitches. The Nordic APIs speaker selection committee looks for these sorts of things, but the tips could apply to any tech event.