Should AI agents scrape or integrate external data?

Bill Doerrfeld | February 3, 2026

My latest on InfoWorld explores the upsides and caveats of both approaches.

To scrape or integrate? It's an age-old question resurfacing for AI agent builders.

Excited to share my analysis today for
InfoWorld, where I break down when it makes sense to scrape public web sources, and when official API integrations are the better choice for external data.

The takeaway: agents need data. New interactive browser tools and scraping techniques help pull in real-time, supplementary signals. But scraping comes with fragility and legal downsides. As
Deepak Singh puts it, "It's building on quicksand."

Scraping is no substitute for the predictable, validated, and governed integrations agents need to execute auditable workflows and real-world actions reliably.

This article features, in order of appearance:

-
Or Lenchner, CEO, Bright Data
-
Deepak Singh, CEO and co-founder, AvairAI Inc.
-
Neeraj Abhyankar, VP, Data and AI, R Systems
-
Gaurav Pathak, VP of AI and metadata, Informatica
-
Keith Pijanowski, AI and ML solutions engineer, MinIO
-
Krishna Subramanian, co-founder and COO, Komprise

Also shout-outs to reports from
PwC (2025 AI Agents Survey), Tray.ai (2024 Enterprise Survey), Salt Security (2025 AI Agents Report), and McKinsey & Company (2025 State of AI Study), plus links to reporting from AI21 Labs, The Register, and WIRED.

Read: How should AI agents consume external data?

Other Blog Posts

By Bill Doerrfeld May 1, 2026
Cloudflare rebuilt Next.js over a weekend using agentic coding.
By Bill Doerrfeld April 20, 2026
My InfoWorld feature reviews the key building blocks in agentic systems and with real-world examples from Shopify, Block, and others.
By Bill Doerrfeld March 31, 2026
My latest InfoWorld feature explores what makes an enterprise MCP registry effective, from semantic discovery to governance and security for AI agents.
By Bill Doerrfeld March 30, 2026
My first-ever contribution to CSO Online looks at the shifting landscape, from perimeter-based security to API security, and how CISOs are responding.
By Bill Doerrfeld March 29, 2026
My latest feature for The New Stack looks into solutions being proposed to fix open source Slopmageddon.
A digital pattern of rounded rectangular blocks in shades of blue and purple, arranged in an interlocking layout.
By Bill Doerrfeld March 27, 2026
My latest DirectorPlus looks at how agentic AI is reshaping platform engineering at Squarespace: less shared code and more developer experience focus.
By Bill Doerrfeld March 19, 2026
Usage-based pricing is reshaping the API economy. Discover 5 API monetization success stories, including OpenAI, Plaid, and AssemblyAI.
A lightbulb against a purple background, containing a human brain with an
By Bill Doerrfeld March 18, 2026
Why event-driven APIs matter for AI workflows, enabling real-time data, scalable systems, and responsive agent behavior.
By Bill Doerrfeld February 28, 2026
While hardware usually gets the spotlight in physical AI, the real differentiator won't be hardware. It'll be the models.
By Bill Doerrfeld February 27, 2026
In the latest DirectorPlus, Workato's CTO explains how MCP-enabled integration catalyzed internal AI usage and ROI.