CAPSOLVER
Blog
How to Solve Captcha in Crawlee with CapSolver Integration

How to Solve Captcha in Crawlee with CapSolver Integration

Logo of CapSolver

Lucas Mitchell

Automation Engineer

24-Dec-2025

TL;DR: Crawlee crawlers often hit CAPTCHA barriers. Integrating CapSolver lets you solve reCAPTCHA, Turnstile, and more, so scraping workflows stay stable and automated.

When building crawlers with Crawlee, running into CAPTCHA is almost unavoidableโ€”especially on modern sites with aggressive bot protection. Even well-configured Playwright or HTTP crawlers can get blocked once reCAPTCHA, Turnstile, or similar challenges appear.

This guide focuses on a practical approach: using CapSolver to handle CAPTCHA challenges directly inside Crawlee workflows. Instead of fighting browser fingerprints endlessly, youโ€™ll see how to detect common CAPTCHA types, solve them programmatically, and keep your crawlers running reliably in real-world scraping scenarios.

What is Crawlee?

Crawlee is a web scraping and browser automation library for Node.js designed to build reliable crawlers that appear human-like and fly under the radar of modern bot protections. Built with TypeScript, it provides both high-level simplicity and low-level customization.

Key Features of Crawlee

  • Unified Interface: Single API for both HTTP and headless browser crawling
  • Anti-Bot Stealth: Automatic browser fingerprint generation, TLS fingerprint replication, and human-like behavior
  • Smart Queue Management: Persistent URL queue with breadth-first and depth-first crawling options
  • Auto-Scaling: Automatic resource scaling based on system load
  • Proxy Rotation: Built-in proxy rotation and session management
  • Multiple Browser Support: Works with Playwright and Puppeteer across Chrome, Firefox, and WebKit

Crawler Types

Crawlee offers multiple crawler types for different use cases:

Crawler Type Description
CheerioCrawler Ultra-fast HTTP crawler using Cheerio for HTML parsing
PlaywrightCrawler Full browser automation with Playwright for JavaScript-heavy sites
PuppeteerCrawler Full browser automation with Puppeteer for JavaScript rendering
JSDOMCrawler HTTP crawler with JSDOM for JavaScript execution without a browser

What is CapSolver?

CapSolver is a leading CAPTCHA solving service that provides AI-powered solutions for bypassing various CAPTCHA challenges. With support for multiple CAPTCHA types and lightning-fast response times, CapSolver integrates seamlessly into automated workflows.

Supported CAPTCHA Types

  • reCAPTCHA v2 (Image & Invisible)
  • reCAPTCHA v3
  • Cloudflare Turnstile
  • AWS WAF
  • And many more...

Why Integrate CapSolver with Crawlee?

When building Crawlee crawlers that interact with protected websites, CAPTCHA challenges can halt your entire scraping pipeline. Here's why the integration matters:

  1. Uninterrupted Crawling: Crawlers continue extracting data without manual intervention
  2. Scalable Operations: Handle multiple CAPTCHA challenges across concurrent crawling sessions
  3. Cost-Effective: Pay only for successfully solved CAPTCHAs
  4. High Success Rates: Industry-leading accuracy for all supported CAPTCHA types

Installation

First, install the required packages:

bash Copy
npm install crawlee playwright axios

Or with yarn:

bash Copy
yarn add crawlee playwright axios

Creating a CapSolver Utility for Crawlee

Here's a reusable CapSolver utility class that can be used across your Crawlee projects:

Basic CapSolver Service

typescript Copy
import axios from 'axios';

const CAPSOLVER_API_KEY = 'YOUR_CAPSOLVER_API_KEY';

interface TaskResult {
    status: string;
    solution?: {
        gRecaptchaResponse?: string;
        token?: string;
    };
    errorDescription?: string;
}

class CapSolverService {
    private apiKey: string;
    private baseUrl = 'https://api.capsolver.com';

    constructor(apiKey: string = CAPSOLVER_API_KEY) {
        this.apiKey = apiKey;
    }

    async createTask(taskData: object): Promise<string> {
        const response = await axios.post(`${this.baseUrl}/createTask`, {
            clientKey: this.apiKey,
            task: taskData
        });

        if (response.data.errorId !== 0) {
            throw new Error(`CapSolver error: ${response.data.errorDescription}`);
        }

        return response.data.taskId;
    }

    async getTaskResult(taskId: string, maxAttempts = 60): Promise<TaskResult> {
        for (let i = 0; i < maxAttempts; i++) {
            await this.sleep(2000);

            const response = await axios.post(`${this.baseUrl}/getTaskResult`, {
                clientKey: this.apiKey,
                taskId
            });

            if (response.data.status === 'ready') {
                return response.data;
            }

            if (response.data.status === 'failed') {
                throw new Error(`Task failed: ${response.data.errorDescription}`);
            }
        }

        throw new Error('Timeout waiting for CAPTCHA solution');
    }

    private sleep(ms: number): Promise<void> {
        return new Promise(resolve => setTimeout(resolve, ms));
    }

    async solveReCaptchaV2(websiteUrl: string, websiteKey: string): Promise<string> {
        const taskId = await this.createTask({
            type: 'ReCaptchaV2TaskProxyLess',
            websiteURL: websiteUrl,
            websiteKey
        });

        const result = await this.getTaskResult(taskId);
        return result.solution?.gRecaptchaResponse || '';
    }

    async solveReCaptchaV3(
        websiteUrl: string,
        websiteKey: string,
        pageAction = 'submit'
    ): Promise<string> {
        const taskId = await this.createTask({
            type: 'ReCaptchaV3TaskProxyLess',
            websiteURL: websiteUrl,
            websiteKey,
            pageAction
        });

        const result = await this.getTaskResult(taskId);
        return result.solution?.gRecaptchaResponse || '';
    }

    async solveTurnstile(websiteUrl: string, websiteKey: string): Promise<string> {
        const taskId = await this.createTask({
            type: 'AntiTurnstileTaskProxyLess',
            websiteURL: websiteUrl,
            websiteKey
        });

        const result = await this.getTaskResult(taskId);
        return result.solution?.token || '';
    }
}

export const capSolver = new CapSolverService();

Solving Different CAPTCHA Types with Crawlee

reCAPTCHA v2 with PlaywrightCrawler

typescript Copy
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const RECAPTCHA_SITE_KEY = 'YOUR_SITE_KEY';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // Check if page has reCAPTCHA
        const hasRecaptcha = await page.$('.g-recaptcha');

        if (hasRecaptcha) {
            log.info('reCAPTCHA detected, solving...');

            // Get the site key from the page
            const siteKey = await page.$eval(
                '.g-recaptcha',
                (el) => el.getAttribute('data-sitekey')
            ) || RECAPTCHA_SITE_KEY;

            // Solve the CAPTCHA
            const token = await capSolver.solveReCaptchaV2(request.url, siteKey);

            // Inject the token - the textarea is hidden, so we use JavaScript
            await page.$eval('#g-recaptcha-response', (el: HTMLTextAreaElement, token: string) => {
                el.style.display = 'block';
                el.value = token;
            }, token);

            // Submit the form
            await page.click('button[type="submit"]');
            await page.waitForLoadState('networkidle');

            log.info('reCAPTCHA solved successfully!');
        }

        // Extract data after CAPTCHA is solved
        const title = await page.title();
        const content = await page.locator('body').innerText();

        await Dataset.pushData({
            title,
            content: content.slice(0, 1000)
        });
    },

    maxRequestsPerCrawl: 50,
    headless: true
});

await crawler.run(['https://example.com/protected-page']);

reCAPTCHA v3 with PlaywrightCrawler

typescript Copy
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // reCAPTCHA v3 is invisible, detect by script
        const recaptchaScript = await page.$('script[src*="recaptcha/api.js?render="]');

        if (recaptchaScript) {
            log.info('reCAPTCHA v3 detected, solving...');

            // Extract site key from the script src
            const scriptSrc = await recaptchaScript.getAttribute('src') || '';
            const siteKeyMatch = scriptSrc.match(/render=([^&]+)/);
            const siteKey = siteKeyMatch ? siteKeyMatch[1] : '';

            if (siteKey) {
                // Solve reCAPTCHA v3
                const token = await capSolver.solveReCaptchaV3(
                    request.url,
                    siteKey,
                    'submit'
                );

                // Inject token into hidden input using JavaScript
                await page.$eval('input[name="g-recaptcha-response"]', (el: HTMLInputElement, token: string) => {
                    el.value = token;
                }, token);

                log.info('reCAPTCHA v3 token injected!');
            }
        }

        // Continue with form submission or data extraction
        const title = await page.title();
        const url = page.url();

        await Dataset.pushData({ title, url });
    }
});

await crawler.run(['https://example.com/v3-protected']);

Cloudflare Turnstile with PlaywrightCrawler

typescript Copy
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // Check for Turnstile widget
        const hasTurnstile = await page.$('.cf-turnstile');

        if (hasTurnstile) {
            log.info('Cloudflare Turnstile detected, solving...');

            // Get site key
            const siteKey = await page.$eval(
                '.cf-turnstile',
                (el) => el.getAttribute('data-sitekey')
            );

            if (siteKey) {
                // Solve Turnstile
                const token = await capSolver.solveTurnstile(request.url, siteKey);

                // Inject token using JavaScript (hidden input)
                await page.$eval('input[name="cf-turnstile-response"]', (el: HTMLInputElement, token: string) => {
                    el.value = token;
                }, token);

                // Submit form
                await page.click('button[type="submit"]');
                await page.waitForLoadState('networkidle');

                log.info('Turnstile solved successfully!');
            }
        }

        // Extract data
        const title = await page.title();
        const content = await page.locator('body').innerText();

        await Dataset.pushData({
            title,
            content: content.slice(0, 500)
        });
    }
});

await crawler.run(['https://example.com/turnstile-protected']);

Advanced Integration: Auto-Detecting CAPTCHA Type

Here's an advanced crawler that automatically detects and solves different CAPTCHA types:

typescript Copy
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

interface CaptchaInfo {
    type: 'recaptcha-v2' | 'recaptcha-v3' | 'turnstile' | 'none';
    siteKey: string | null;
}

async function detectCaptcha(page: any): Promise<CaptchaInfo> {
    // Check for reCAPTCHA v2
    const recaptchaV2 = await page.$('.g-recaptcha');
    if (recaptchaV2) {
        const siteKey = await page.$eval('.g-recaptcha', (el: Element) =>
            el.getAttribute('data-sitekey')
        );
        return { type: 'recaptcha-v2', siteKey };
    }

    // Check for reCAPTCHA v3
    const recaptchaV3Script = await page.$('script[src*="recaptcha/api.js?render="]');
    if (recaptchaV3Script) {
        const scriptSrc = await recaptchaV3Script.getAttribute('src') || '';
        const match = scriptSrc.match(/render=([^&]+)/);
        const siteKey = match ? match[1] : null;
        return { type: 'recaptcha-v3', siteKey };
    }

    // Check for Turnstile
    const turnstile = await page.$('.cf-turnstile');
    if (turnstile) {
        const siteKey = await page.$eval('.cf-turnstile', (el: Element) =>
            el.getAttribute('data-sitekey')
        );
        return { type: 'turnstile', siteKey };
    }

    return { type: 'none', siteKey: null };
}

async function solveCaptcha(
    page: any,
    url: string,
    captchaInfo: CaptchaInfo
): Promise<void> {
    if (!captchaInfo.siteKey || captchaInfo.type === 'none') return;

    let token: string;

    switch (captchaInfo.type) {
        case 'recaptcha-v2':
            token = await capSolver.solveReCaptchaV2(url, captchaInfo.siteKey);
            // Hidden textarea - use JavaScript to set value
            await page.$eval('#g-recaptcha-response', (el: HTMLTextAreaElement, t: string) => {
                el.style.display = 'block';
                el.value = t;
            }, token);
            break;

        case 'recaptcha-v3':
            token = await capSolver.solveReCaptchaV3(url, captchaInfo.siteKey);
            // Hidden input - use JavaScript to set value
            await page.$eval('input[name="g-recaptcha-response"]', (el: HTMLInputElement, t: string) => {
                el.value = t;
            }, token);
            break;

        case 'turnstile':
            token = await capSolver.solveTurnstile(url, captchaInfo.siteKey);
            // Hidden input - use JavaScript to set value
            await page.$eval('input[name="cf-turnstile-response"]', (el: HTMLInputElement, t: string) => {
                el.value = t;
            }, token);
            break;
    }
}

const crawler = new PlaywrightCrawler({
    async requestHandler({ page, request, log, enqueueLinks }) {
        log.info(`Processing ${request.url}`);

        // Auto-detect CAPTCHA
        const captchaInfo = await detectCaptcha(page);

        if (captchaInfo.type !== 'none') {
            log.info(`Detected ${captchaInfo.type}, solving...`);
            await solveCaptcha(page, request.url, captchaInfo);

            // Submit form if exists
            const submitBtn = await page.$('button[type="submit"], input[type="submit"]');
            if (submitBtn) {
                await submitBtn.click();
                await page.waitForLoadState('networkidle');
            }

            log.info('CAPTCHA solved successfully!');
        }

        // Extract data
        const title = await page.title();
        const url = page.url();
        const text = await page.locator('body').innerText();

        await Dataset.pushData({
            title,
            url,
            text: text.slice(0, 1000)
        });

        // Continue crawling
        await enqueueLinks();
    },

    maxRequestsPerCrawl: 100
});

await crawler.run(['https://example.com']);

How to Submit CAPTCHA Tokens

Each CAPTCHA type requires a different submission method in the browser context:

reCAPTCHA v2/v3 - Token Injection

typescript Copy
async function submitRecaptchaToken(page: any, token: string): Promise<void> {
    // The response textarea is hidden - use JavaScript to set the value
    await page.$eval('#g-recaptcha-response', (el: HTMLTextAreaElement, token: string) => {
        el.style.display = 'block';
        el.value = token;
    }, token);

    // Also set hidden input if exists (common in custom implementations)
    try {
        await page.$eval('input[name="g-recaptcha-response"]', (el: HTMLInputElement, token: string) => {
            el.value = token;
        }, token);
    } catch (e) {
        // Input might not exist
    }

    // Submit the form
    await page.click('form button[type="submit"]');
}

Turnstile - Token Injection

typescript Copy
async function submitTurnstileToken(page: any, token: string): Promise<void> {
    // Set token in hidden input using JavaScript
    await page.$eval('input[name="cf-turnstile-response"]', (el: HTMLInputElement, token: string) => {
        el.value = token;
    }, token);

    // Submit the form
    await page.click('form button[type="submit"]');
}

Using CapSolver Browser Extension with Crawlee

For scenarios where you want automatic CAPTCHA solving, you can load the CapSolver browser extension:

typescript Copy
import { PlaywrightCrawler } from 'crawlee';
import path from 'path';

const crawler = new PlaywrightCrawler({
    launchContext: {
        launchOptions: {
            // Load CapSolver extension
            args: [
                `--disable-extensions-except=${path.resolve('./capsolver-extension')}`,
                `--load-extension=${path.resolve('./capsolver-extension')}`
            ],
            headless: false // Extensions require headed mode
        }
    },

    async requestHandler({ page, request, log }) {
        log.info(`Processing ${request.url}`);

        // The extension will automatically solve CAPTCHAs
        // Wait for potential CAPTCHA to be solved
        await page.waitForTimeout(5000);

        // Continue with scraping
        const title = await page.title();
        const content = await page.locator('body').innerText();

        console.log({ title, content });
    }
});

await crawler.run(['https://example.com/captcha-page']);

Best Practices

1. Error Handling with Retries

typescript Copy
async function solveWithRetry(
    solverFn: () => Promise<string>,
    maxRetries = 3
): Promise<string> {
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return await solverFn();
        } catch (error) {
            if (attempt === maxRetries - 1) throw error;

            const delay = Math.pow(2, attempt) * 1000; // Exponential backoff
            await new Promise(resolve => setTimeout(resolve, delay));
        }
    }
    throw new Error('Max retries exceeded');
}

// Usage
const token = await solveWithRetry(() =>
    capSolver.solveReCaptchaV2(url, siteKey)
);

2. Balance Management

typescript Copy
import axios from 'axios';

async function checkBalance(apiKey: string): Promise<number> {
    const response = await axios.post('https://api.capsolver.com/getBalance', {
        clientKey: apiKey
    });

    return response.data.balance || 0;
}

// Check before starting crawler
const balance = await checkBalance(CAPSOLVER_API_KEY);
if (balance < 1) {
    console.warn('Low CapSolver balance! Please recharge.');
}

3. Session Management for Multiple Pages

typescript Copy
import { PlaywrightCrawler, Dataset } from 'crawlee';
import { capSolver } from './capsolver-service';

// Cache solved tokens for same domain/key combinations
const tokenCache = new Map<string, { token: string; timestamp: number }>();
const TOKEN_TTL = 90000; // 90 seconds

async function getCachedToken(
    url: string,
    siteKey: string,
    solverFn: () => Promise<string>
): Promise<string> {
    const cacheKey = `${new URL(url).hostname}:${siteKey}`;
    const cached = tokenCache.get(cacheKey);

    if (cached && Date.now() - cached.timestamp < TOKEN_TTL) {
        return cached.token;
    }

    const token = await solverFn();
    tokenCache.set(cacheKey, { token, timestamp: Date.now() });
    return token;
}

4. Proxy Integration

typescript Copy
import { PlaywrightCrawler, ProxyConfiguration } from 'crawlee';

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: [
        'http://user:[email protected]:8080',
        'http://user:[email protected]:8080',
        'http://user:[email protected]:8080'
    ]
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,

    async requestHandler({ page, request, log, proxyInfo }) {
        log.info(`Using proxy: ${proxyInfo?.url}`);

        // Your CAPTCHA solving and scraping logic here
    }
});

Complete Example: E-commerce Scraper with CAPTCHA Handling

typescript Copy
import { PlaywrightCrawler, Dataset, ProxyConfiguration } from 'crawlee';
import { capSolver } from './capsolver-service';

interface Product {
    name: string;
    price: string;
    url: string;
    image: string;
}

const proxyConfiguration = new ProxyConfiguration({
    proxyUrls: ['http://user:[email protected]:8080']
});

const crawler = new PlaywrightCrawler({
    proxyConfiguration,
    maxRequestsPerCrawl: 200,
    maxConcurrency: 5,

    async requestHandler({ page, request, log, enqueueLinks }) {
        log.info(`Scraping: ${request.url}`);

        // Check for any CAPTCHA
        const hasRecaptcha = await page.$('.g-recaptcha');
        const hasTurnstile = await page.$('.cf-turnstile');

        if (hasRecaptcha) {
            const siteKey = await page.$eval(
                '.g-recaptcha',
                (el) => el.getAttribute('data-sitekey')
            );

            if (siteKey) {
                log.info('Solving reCAPTCHA...');
                const token = await capSolver.solveReCaptchaV2(request.url, siteKey);

                // Inject token using JavaScript (hidden element)
                await page.$eval('#g-recaptcha-response', (el: HTMLTextAreaElement, t: string) => {
                    el.style.display = 'block';
                    el.value = t;
                }, token);
                await page.click('button[type="submit"]');
                await page.waitForLoadState('networkidle');
            }
        }

        if (hasTurnstile) {
            const siteKey = await page.$eval(
                '.cf-turnstile',
                (el) => el.getAttribute('data-sitekey')
            );

            if (siteKey) {
                log.info('Solving Turnstile...');
                const token = await capSolver.solveTurnstile(request.url, siteKey);

                // Inject token using JavaScript (hidden element)
                await page.$eval('input[name="cf-turnstile-response"]', (el: HTMLInputElement, t: string) => {
                    el.value = t;
                }, token);
                await page.click('button[type="submit"]');
                await page.waitForLoadState('networkidle');
            }
        }

        // Extract product data using Playwright locators
        const productCards = await page.locator('.product-card').all();
        const products: Product[] = [];

        for (const card of productCards) {
            products.push({
                name: await card.locator('.product-name').innerText().catch(() => ''),
                price: await card.locator('.product-price').innerText().catch(() => ''),
                url: await card.locator('a').getAttribute('href') || '',
                image: await card.locator('img').getAttribute('src') || ''
            });
        }

        if (products.length > 0) {
            await Dataset.pushData(products);
            log.info(`Extracted ${products.length} products`);
        }

        // Enqueue pagination and category links
        await enqueueLinks({
            globs: ['**/products/**', '**/page/**', '**/category/**']
        });
    },

    failedRequestHandler({ request, log }) {
        log.error(`Request failed: ${request.url}`);
    }
});

// Start crawling
await crawler.run(['https://example-store.com/products']);

// Export results
const dataset = await Dataset.open();
await dataset.exportToCSV('products.csv');

console.log('Scraping complete! Results saved to products.csv');

Conclusion

Integrating CapSolver with Crawlee unlocks the full potential of web scraping for Node.js developers. By combining Crawlee's robust crawling infrastructure with CapSolver's industry-leading CAPTCHA solving capabilities, you can build reliable scrapers that handle even the most challenging bot protection mechanisms.

Whether you're building data extraction pipelines, price monitoring systems, or content aggregation tools, the Crawlee + CapSolver combination provides the reliability and scalability needed for production environments.


Ready to get started? Sign up for CapSolver and use bonus code CRAWLEE for an extra 6% bonus on your every recharge!


FAQ

What is Crawlee?

Crawlee is a web scraping and browser automation library for Node.js designed to build reliable crawlers. It supports both HTTP-based crawling (with Cheerio/JSDOM) and full browser automation (with Playwright/Puppeteer), and includes built-in features like proxy rotation, session management, and anti-bot stealth.

How does CapSolver integrate with Crawlee?

CapSolver integrates with Crawlee through a service class that wraps the CapSolver API. Within your crawler's request handler, you can detect CAPTCHA challenges and use CapSolver to solve them, then inject the tokens back into the page.

What types of CAPTCHAs can CapSolver solve?

CapSolver supports a wide range of CAPTCHA types including reCAPTCHA v2, reCAPTCHA v3, Cloudflare Turnstile, AWS WAF, GeeTest, and many more.

How much does CapSolver cost?

CapSolver offers competitive pricing based on the type and volume of CAPTCHAs solved. Visit capsolver.com for current pricing details. Use code CRAWLEE for a 6% bonus on your first recharge.

Can I use CapSolver with other Node.js frameworks?

Yes! CapSolver provides a REST API that can be integrated with any Node.js framework, including Express, Puppeteer standalone, Selenium, and more.

Is Crawlee free to use?

Yes, Crawlee is open-source and released under the Apache 2.0 license. The framework is free to use, though you may incur costs for proxy services and CAPTCHA solving services like CapSolver.

How do I find the CAPTCHA site key?

The site key is typically found in the page's HTML source. Look for:

  • reCAPTCHA: data-sitekey attribute on .g-recaptcha element
  • Turnstile: data-sitekey attribute on .cf-turnstile element
  • Or check network requests for the key in API calls

Which Crawlee crawler type should I use?

  • CheerioCrawler: Best for fast, simple HTML scraping without JavaScript
  • PlaywrightCrawler: Best for JavaScript-heavy sites and CAPTCHA solving (recommended for CapSolver integration)
  • PuppeteerCrawler: Alternative to Playwright with similar capabilities
  • JSDOMCrawler: Good middle-ground with basic JavaScript support

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More