Should AI agents scrape or integrate external data?

Bill Doerrfeld | February 3, 2026

My latest on InfoWorld explores the upsides and caveats of both approaches.

To scrape or integrate? It's an age-old question resurfacing for AI agent builders.

Excited to share my analysis today for
InfoWorld, where I break down when it makes sense to scrape public web sources, and when official API integrations are the better choice for external data.

The takeaway: agents need data. New interactive browser tools and scraping techniques help pull in real-time, supplementary signals. But scraping comes with fragility and legal downsides. As
Deepak Singh puts it, "It's building on quicksand."

Scraping is no substitute for the predictable, validated, and governed integrations agents need to execute auditable workflows and real-world actions reliably.

This article features, in order of appearance:

-
Or Lenchner, CEO, Bright Data
-
Deepak Singh, CEO and co-founder, AvairAI Inc.
-
Neeraj Abhyankar, VP, Data and AI, R Systems
-
Gaurav Pathak, VP of AI and metadata, Informatica
-
Keith Pijanowski, AI and ML solutions engineer, MinIO
-
Krishna Subramanian, co-founder and COO, Komprise

Also shout-outs to reports from
PwC (2025 AI Agents Survey), Tray.ai (2024 Enterprise Survey), Salt Security (2025 AI Agents Report), and McKinsey & Company (2025 State of AI Study), plus links to reporting from AI21 Labs, The Register, and WIRED.

Read: How should AI agents consume external data?

Other Blog Posts

By Bill Doerrfeld March 19, 2026
Usage-based pricing is reshaping the API economy. Discover 5 API monetization success stories, including OpenAI, Plaid, and AssemblyAI.
A lightbulb against a purple background, containing a human brain with an
By Bill Doerrfeld March 18, 2026
Why event-driven APIs matter for AI workflows, enabling real-time data, scalable systems, and responsive agent behavior.
By Bill Doerrfeld February 28, 2026
While hardware usually gets the spotlight in physical AI, the real differentiator won't be hardware. It'll be the models.
By Bill Doerrfeld February 27, 2026
In the latest DirectorPlus, Workato's CTO explains how MCP-enabled integration catalyzed internal AI usage and ROI.
By Bill Doerrfeld February 18, 2026
My latest on InfoWorld reviews MCP servers from 5 major cloud providers
By Bill Doerrfeld February 18, 2026
How are organizations actually using agentic knowledge bases in practice? My article for The New Stack looks at six emerging patterns.
eBPF in Production Report
By Bill Doerrfeld February 12, 2026
My report for the eBPF Foundation explores enterprise eBPF case studies, production deployments, and real business outcomes across cloud-native environments.
Close-up of whole bean coffee Bottomless
By Bill Doerrfeld February 10, 2026
Longtime Bottomless user sharing why I love automated coffee delivery triggered by a smart scale, plus a referral link for a free first bag.
By Bill Doerrfeld February 5, 2026
MCP servers can quickly drain context windows without guardrails. Thankfully, there are ways around this, say the experts.
By Bill Doerrfeld February 4, 2026
It may seem like AI agents are suddenly doing everything across industries. But in reality, the pace of agentic AI is moving carefully, and very deliberately, in highly regulated environments like finance and banking.