ProductsIntegrationsResourcesDocumentationPricing
Start Now

© 2026 CapSolver. All rights reserved.

CONTACT US

Slack: lola@capsolver.com

Products

  • reCAPTCHA v2
  • reCAPTCHA v3
  • Cloudflare Turnstile
  • Cloudflare Challenge
  • AWS WAF
  • Browser Extension
  • Many more CAPTCHA types

Integrations

  • Selenium
  • Playwright
  • Puppeteer
  • n8n
  • Partners
  • View All Integrations

Resources

  • Referral System
  • Documentation
  • API Reference
  • Blog
  • FAQs
  • Glossary
  • Status

Legal

  • Terms & Conditions
  • Privacy Policy
  • Refund Policy
  • Don't Sell My Info
Blog/All/How to Use ScrapeGraph AI for Web Scraping
Sep05, 2024

How to Use ScrapeGraph AI for Web Scraping

Lucas Mitchell

Lucas Mitchell

Automation Engineer

How to Use ScrapeGraph AI for Web Scraping

What is ScrapeGraph AI?

ScrapeGraph AI is a Python web scraping library that leverages LLMs and graph-based logic to build scraping pipelines for websites and local documents (including XML, HTML, JSON, Markdown, and more). Simply specify the data you want to extract, and the library will handle the rest!

The library provides several features:

  • Support many LLMs: GPT, Gemini, Groq, Azure, Hugging Face
  • Local Models: Ollama.
  • Proxy support for handling requests behind proxies.

Prerequisites

Before you dive into using ScrapeGraph AI, ensure you have the following installed:

bash Copy
pip install scrapegraphai capsolver

playwright install

Getting Started with ScrapeGraph AI

Here's a basic example of how to use ScrapeGraph AI with OpenAI to scrape a webpage:

python Copy
import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the quotes with their description",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Here's a basic example of how to use ScrapeGraph AI with Local LLM (Ollama) to scrape a webpage:

python Copy
import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "model": "ollama/llama3.1",
        "temperature": 0,
        "format": "json",  # Ollama needs the format to be specified explicitly
        # "base_url": "http://localhost:11434", # set ollama URL arbitrarily
    },
    "verbose": True,
    "headless": False
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the quotes with their description",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Handling Captchas with CapSolver and ScrapeGraph AI

In this section, we'll explore how to integrate Capsolver with ScrapeGraph AI to bypass captchas. CapSolver is an external service that helps in solving various types of captchas, including ReCaptcha V2, which is commonly used on websites.

We will demonstrate solving ReCaptcha V2 using Capsolver and then scraping the content of a page that requires solving the captcha first.

Bonus Code

Claim Your Bonus Code for top captcha solutions; CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Example: Solving ReCaptcha V2 with Capsolver and ScrapeGraph AI

python Copy
import capsolver
import os
import json
from scrapegraphai.graphs import SmartScraperGraph

# Consider using environment variables for sensitive information
PROXY = os.getenv("PROXY", "http://username:password@host:port")
capsolver.api_key = os.getenv("CAPSOLVER_API_KEY", "Your Capsolver API Key")
PAGE_URL = os.getenv("PAGE_URL", "PAGE_URL")
PAGE_KEY = os.getenv("PAGE_SITE_KEY", "PAGE_SITE_KEY")

def solve_recaptcha_v2(url, key):
    solution = capsolver.solve({
        "type": "ReCaptchaV2Task",
        "websiteURL": url,
        "websiteKey": key,
        "proxy": PROXY
    })
    return solution['solution']['gRecaptchaResponse']

def main():
    print("Solving reCaptcha v2")
    solution = solve_recaptcha_v2(PAGE_URL, PAGE_KEY)
    print("Solution: ", solution)

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Find the description of each quote.",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Conclusion

With ScrapeGraph AI, you can efficiently scrape websites while handling the complexities of proxies and captchas. Combining it with Capsolver allows you to bypass ReCaptcha V2 challenges seamlessly, enabling access to content that would otherwise be difficult to scrape.

Feel free to extend this script to suit your scraping needs and experiment with additional features offered by ScrapeGraph AI. Always ensure that your scraping activities respect website terms of service and legal guidelines.

Happy scraping!

More

aws wafMay 06, 2026

How to Solve AWS WAF Challenge Without a Browser: A Technical Guide

Learn how to solve AWS WAF challenges and CAPTCHAs without a browser. Use CapSolver's API to generate tokens and bypass 405 status codes.

Ethan Collins
Ethan Collins
Web ScrapingApr 30, 2026

Web Scraping on Linux: Tools, Setup & Practical Guide

Set up web scraping on Linux with Python, proxies, and CAPTCHA handling. A practical developer guide covering Scrapy, Playwright, CapSolver, and data pipelines.

Contents

Lucas Mitchell
Lucas Mitchell
CloudflareApr 30, 2026

Cloudflare Error 1020: Access Denied in Web Scraping & WAF Protection

Learn what triggers Cloudflare Error 1020 Access Denied, how the Web Application Firewall and bot detection work, and how developers can reduce false positives in legitimate automation workflows.

Anh Tuan
Anh Tuan
ExtensionApr 29, 2026

Best Auto CAPTCHA Solver Extensions for Chrome in 2026

Discover the best auto CAPTCHA solver Chrome extensions in 2026. Compare CapSolver, NopeCHA, and SolveCaptcha by speed, supported types, and privacy to find the right fit.

Ethan Collins
Ethan Collins