Firecrawl: Anti-Bot Web Scraping

Firecrawl

OpenClaw integrates with Firecrawl, a powerful content extraction service designed to bypass bot detection and handle complex, JavaScript-heavy websites.

Why use Firecrawl?

Anti-Bot Circumvention: Successfully scrapes sites that block standard HTTP requests (e.g., Cloudflare challenges).
JavaScript Rendering: Handles single-page applications (SPAs) that require JS to load content.
Smart Caching: Reduces API costs and latency by caching results for up to 2 days (configurable).
Fallback Mechanism: Automatically used by the web_fetch tool when local extraction fails.

Setup

Get an API Key Create an account at Firecrawl.dev and generate an API key.
Configure OpenClaw Add the key to your configuration file (~/.openclaw/openclaw.json) or environment variables.

Configuration

{
  tools: {
    web: {
      fetch: {
        firecrawl: {
          apiKey: "FIRECRAWL_API_KEY_HERE",
          baseUrl: "https://api.firecrawl.dev",
          onlyMainContent: true,     // Extract only the article body
          maxAgeMs: 172800000,       // Cache duration (default: 2 days)
          timeoutSeconds: 60
        }
      }
    }
  }
}

How it works

The web_fetch tool uses a smart waterfall strategy to retrieve content:

Local Readability: Attempts to fetch and parse the page locally (fastest, free).
Firecrawl: If local fetch fails (e.g., 403 Forbidden, 401 Unauthorized, or empty content), it seamlessly retries using Firecrawl.
Basic HTML: Final fallback to raw HTML extraction if all else fails.

Stealth Mode

Firecrawl uses an intelligent proxy system:

OpenClaw requests proxy: "auto".
Firecrawl attempts a standard request first.
If blocked, it automatically retries with stealth proxies (residential IPs) to bypass defenses.

See Web Tools for more details on the complete web scraping toolkit.