CAPSOLVER
Blog
How to Use ScrapeGraph AI for Web Scraping

How to Use ScrapeGraph AI for Web Scraping

Logo of CapSolver

Lucas Mitchell

Automation Engineer

04-Sep-2024

How to Use ScrapeGraph AI for Web Scraping

What is ScrapeGraph AI?

ScrapeGraph AI is a Python web scraping library that leverages LLMs and graph-based logic to build scraping pipelines for websites and local documents (including XML, HTML, JSON, Markdown, and more). Simply specify the data you want to extract, and the library will handle the rest!

The library provides several features:

  • Support many LLMs: GPT, Gemini, Groq, Azure, Hugging Face
  • Local Models: Ollama.
  • Proxy support for handling requests behind proxies.

Prerequisites

Before you dive into using ScrapeGraph AI, ensure you have the following installed:

bash Copy
pip install scrapegraphai capsolver

playwright install

Getting Started with ScrapeGraph AI

Here's a basic example of how to use ScrapeGraph AI with OpenAI to scrape a webpage:

python Copy
import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the quotes with their description",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Here's a basic example of how to use ScrapeGraph AI with Local LLM (Ollama) to scrape a webpage:

python Copy
import json
from scrapegraphai.graphs import SmartScraperGraph

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "model": "ollama/llama3.1",
        "temperature": 0,
        "format": "json",  # Ollama needs the format to be specified explicitly
        # "base_url": "http://localhost:11434", # set ollama URL arbitrarily
    },
    "verbose": True,
    "headless": False
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="List me all the quotes with their description",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Handling Captchas with CapSolver and ScrapeGraph AI

In this section, we'll explore how to integrate Capsolver with ScrapeGraph AI to bypass captchas. CapSolver is an external service that helps in solving various types of captchas, including ReCaptcha V2, which is commonly used on websites.

We will demonstrate solving ReCaptcha V2 using Capsolver and then scraping the content of a page that requires solving the captcha first.

Bonus Code

Claim Your Bonus Code for top captcha solutions; CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Example: Solving ReCaptcha V2 with Capsolver and ScrapeGraph AI

python Copy
import capsolver
import os
import json
from scrapegraphai.graphs import SmartScraperGraph

# Consider using environment variables for sensitive information
PROXY = os.getenv("PROXY", "http://username:password@host:port")
capsolver.api_key = os.getenv("CAPSOLVER_API_KEY", "Your Capsolver API Key")
PAGE_URL = os.getenv("PAGE_URL", "PAGE_URL")
PAGE_KEY = os.getenv("PAGE_SITE_KEY", "PAGE_SITE_KEY")

def solve_recaptcha_v2(url, key):
    solution = capsolver.solve({
        "type": "ReCaptchaV2Task",
        "websiteURL": url,
        "websiteKey": key,
        "proxy": PROXY
    })
    return solution['solution']['gRecaptchaResponse']

def main():
    print("Solving reCaptcha v2")
    solution = solve_recaptcha_v2(PAGE_URL, PAGE_KEY)
    print("Solution: ", solution)

# Define the configuration for the scraping pipeline
graph_config = {
    "llm": {
        "api_key": "YOUR_OPENAI_APIKEY",
        "model": "openai/gpt-4o-mini",
    },
    "verbose": True,
    "headless": False,
}

# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
    prompt="Find the description of each quote.",
    source="https://quotes.toscrape.com/",
    config=graph_config
)

# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))

Conclusion

With ScrapeGraph AI, you can efficiently scrape websites while handling the complexities of proxies and captchas. Combining it with Capsolver allows you to bypass ReCaptcha V2 challenges seamlessly, enabling access to content that would otherwise be difficult to scrape.

Feel free to extend this script to suit your scraping needs and experiment with additional features offered by ScrapeGraph AI. Always ensure that your scraping activities respect website terms of service and legal guidelines.

Happy scraping!

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More

Easyspider CapSolver Captcha Integration
How to Solve Captcha in EasySpider with CapSolver Integration

EasySpider is a visual, no-code web scraping and browser automation tool, and when combined with CapSolver, it can reliably solve CAPTCHAs like reCAPTCHA v2 and Cloudflare Turnstile, enabling seamless automated data extraction across websites.

web scraping
Logo of CapSolver

Lucas Mitchell

04-Feb-2026

Relevance AI with CapSolver
How to Solve reCAPTCHA v2 in Relevance AI with CapSolver Integration

Build a Relevance AI tool to solve reCAPTCHA v2 using CapSolver. Automate form submissions via API without browser automation.

web scraping
Logo of CapSolver

Lucas Mitchell

03-Feb-2026

Instant Data Scraper Tools: Fast Ways to Extract Web Data Without Code
Instant Data Scraper Tools: Fast Ways to Extract Web Data Without Code

Discover the best instant data scraper tools for 2026. Learn fast ways to extract web data without code using top extensions and APIs for automated extraction.

web scraping
Logo of CapSolver

Emma Foster

27-Jan-2026

Browser Use vs Browserbase: Which Browser Automation Tool Is Better for AI Agents?
Browser Use vs Browserbase: Which Browser Automation Tool Is Better for AI Agents?

Compare Browser Use vs Browserbase for AI agent automation. Discover features, pricing, and how to solve CAPTCHAs with CapSolver for seamless workflows.

AI
Logo of CapSolver

Anh Tuan

27-Jan-2026

IP Bans in 2026: How They Work and Practical Ways to Bypass Them
IP Bans in 2026: How They Work and Practical Ways to Bypass Them

Learn how to bypass ip ban in 2026 with our comprehensive guide. Discover modern IP blocking techniques and practical solutions like residential proxies and CAPTCHA solvers.

web scraping
Logo of CapSolver

Lucas Mitchell

26-Jan-2026

Web Scraping News Articles
Web Scraping News Articles with Python (2026 Guide)

Master web scraping news articles with Python in 2026. Learn to solve reCAPTCHA v2/v3 with CapSolver, and build scalable data pipelines.

web scraping
Logo of CapSolver

Ethan Collins

26-Jan-2026