How to Scrape a Website With Gumloop

Last updated: March 26, 2026

Use a Gumloop agent with a web scraping MCP tool. Add Firecrawl, Exa, or Parallel to your agent's toolset, then tell the agent what you want to scrape in plain language — the agent picks the right strategy automatically.

Which Approach Should You Use?

Gumloop agents don't have web access by default. To scrape or research websites, you connect one or more MCP tools to your agent -- the agent then picks the right tool based on what you ask it to do.

Tool

Best for

Firecrawl MCP

Full-site crawls, bulk scraping, bot-protected sites, screenshots, structured extraction

Exa MCP

Research and discovery — finding relevant pages, extracting content, similarity search, cited answers

Parallel MCP

Web search + content extraction + website change monitoring

Not sure which MCP tool to pick? See the comparison below.

Comparing the MCP Options

Capability

Firecrawl

Exa

Parallel

Single page scrape

Yes

Yes

Yes

Full site crawl

Yes

No

No

Bulk URL scraping

Yes

Yes

No

Web search

Yes

Yes (neural + keyword)

Yes

Find similar pages

No

Yes

No

Change monitoring

No

No

Yes

Autonomous site navigation

Yes (Deep Extract)

Yes (Research Task)

Yes (Task Run)

Bot-protected sites

Best — residential proxies available

Good

Good

Best for

Crawling whole sites, bulk scraping, screenshots

Research, discovery, cited answers

Monitoring + light extraction

General guidance:

  • Scraping full websites or large batches of URLs → Firecrawl

  • Researching a topic, finding relevant pages, getting cited answers → Exa

  • Monitoring a competitor page for changes → Parallel

  • You have a complex research agent → add all of them

Setting Up MCP Tools

The setup is the same for all three:

  1. Open your agent and click + App in the Tools section.

    image.png
  2. Search for FirecrawlExa, or Parallel.

  3. Select it and click Add Selected.

    image.png
  4. Click Save in the top-right corner.

    image.png

Common Issues

  • The site returns empty results or blocked errors

    Some sites use bot protection (like Cloudflare) to block automated requests -- switch to Firecrawl MCP, which uses residential proxies and handles these restrictions significantly better.

  • "The provided URL is not reachable"

    The URL is not publicly accessible or has a DNS issue. Check that:

    • The URL includes https://

    • The site is not behind a VPN or firewall

    • The domain is spelled correctly

  • The content is behind a login

    Scraping tools only see what a logged-out visitor sees. If the platform has a native Gumloop integration (Confluence, Notion, Google Drive, HubSpot), use that instead. It authenticates via API and gives you reliable access without scraping. For example, to pull Confluence pages into your agent, add the Confluence MCP tool rather than scraping the URL.

  • Firecrawl Screenshots stop loading after a while

    Screenshot URLs from Firecrawl are signed temporary URLs. Download images to your own storage within the same agent session if you need them long-term.


For anything else, reach out at support@gumloop.com or in the shared Slack channel.