How to Scrape a Website With Gumloop

Last updated: March 26, 2026

Use a Gumloop agent with a web scraping MCP tool. Add Firecrawl, Exa, or Parallel to your agent's toolset, then tell the agent what you want to scrape in plain language — the agent picks the right strategy automatically.

Which Approach Should You Use?

Gumloop agents don't have web access by default. To scrape or research websites, you connect one or more MCP tools to your agent -- the agent then picks the right tool based on what you ask it to do.

Tool	Best for
Firecrawl MCP	Full-site crawls, bulk scraping, bot-protected sites, screenshots, structured extraction
Exa MCP	Research and discovery — finding relevant pages, extracting content, similarity search, cited answers
Parallel MCP	Web search + content extraction + website change monitoring

Not sure which MCP tool to pick? See the comparison below.

Comparing the MCP Options

Capability	Firecrawl	Exa	Parallel
Single page scrape	Yes	Yes	Yes
Full site crawl	Yes	No	No
Bulk URL scraping	Yes	Yes	No
Web search	Yes	Yes (neural + keyword)	Yes
Find similar pages	No	Yes	No
Change monitoring	No	No	Yes
Autonomous site navigation	Yes (Deep Extract)	Yes (Research Task)	Yes (Task Run)
Bot-protected sites	Best — residential proxies available	Good	Good
Best for	Crawling whole sites, bulk scraping, screenshots	Research, discovery, cited answers	Monitoring + light extraction

General guidance:

Scraping full websites or large batches of URLs → Firecrawl
Researching a topic, finding relevant pages, getting cited answers → Exa
Monitoring a competitor page for changes → Parallel
You have a complex research agent → add all of them

Setting Up MCP Tools

The setup is the same for all three:

Open your agent and click + App in the Tools section.
Search for Firecrawl, Exa, or Parallel.
Select it and click Add Selected.
Click Save in the top-right corner.

Common Issues

The site returns empty results or blocked errors
Some sites use bot protection (like Cloudflare) to block automated requests -- switch to Firecrawl MCP, which uses residential proxies and handles these restrictions significantly better.
"The provided URL is not reachable"
The URL is not publicly accessible or has a DNS issue. Check that:
- The URL includes https://
- The site is not behind a VPN or firewall
- The domain is spelled correctly
The content is behind a login
Scraping tools only see what a logged-out visitor sees. If the platform has a native Gumloop integration (Confluence, Notion, Google Drive, HubSpot), use that instead. It authenticates via API and gives you reliable access without scraping. For example, to pull Confluence pages into your agent, add the Confluence MCP tool rather than scraping the URL.

Firecrawl Screenshots stop loading after a while
Screenshot URLs from Firecrawl are signed temporary URLs. Download images to your own storage within the same agent session if you need them long-term.

For anything else, reach out at support@gumloop.com or in the shared Slack channel.