Firecrawl: Anti-Bot Web Scraping
Firecrawl
Section titled “Firecrawl”OpenClaw integrates with Firecrawl, a powerful content extraction service designed to bypass bot detection and handle complex, JavaScript-heavy websites.
Why use Firecrawl?
Section titled “Why use Firecrawl?”- Anti-Bot Circumvention: Successfully scrapes sites that block standard HTTP requests (e.g., Cloudflare challenges).
- JavaScript Rendering: Handles single-page applications (SPAs) that require JS to load content.
- Smart Caching: Reduces API costs and latency by caching results for up to 2 days (configurable).
- Fallback Mechanism: Automatically used by the
web_fetchtool when local extraction fails.
Get an API Key Create an account at Firecrawl.dev and generate an API key.
Configure OpenClaw Add the key to your configuration file (
~/.openclaw/openclaw.json) or environment variables.
Configuration
Section titled “Configuration”{ tools: { web: { fetch: { firecrawl: { apiKey: "FIRECRAWL_API_KEY_HERE", baseUrl: "https://api.firecrawl.dev", onlyMainContent: true, // Extract only the article body maxAgeMs: 172800000, // Cache duration (default: 2 days) timeoutSeconds: 60 } } } }}How it works
Section titled “How it works”The web_fetch tool uses a smart waterfall strategy to retrieve content:
- Local Readability: Attempts to fetch and parse the page locally (fastest, free).
- Firecrawl: If local fetch fails (e.g., 403 Forbidden, 401 Unauthorized, or empty content), it seamlessly retries using Firecrawl.
- Basic HTML: Final fallback to raw HTML extraction if all else fails.
Stealth Mode
Section titled “Stealth Mode”Firecrawl uses an intelligent proxy system:
- OpenClaw requests
proxy: "auto". - Firecrawl attempts a standard request first.
- If blocked, it automatically retries with stealth proxies (residential IPs) to bypass defenses.
See Web Tools for more details on the complete web scraping toolkit.