How to Use ScrapeGraph AI for Web Scraping

Lucas Mitchell
Automation Engineer
04-Sep-2024
How to Use ScrapeGraph AI for Web Scraping
What is ScrapeGraph AI?
ScrapeGraph AI is a Python web scraping library that leverages LLMs and graph-based logic to build scraping pipelines for websites and local documents (including XML, HTML, JSON, Markdown, and more). Simply specify the data you want to extract, and the library will handle the rest!
The library provides several features:
- Support many LLMs: GPT, Gemini, Groq, Azure, Hugging Face
- Local Models: Ollama.
- Proxy support for handling requests behind proxies.
Prerequisites
Before you dive into using ScrapeGraph AI, ensure you have the following installed:
bash
pip install scrapegraphai capsolver
playwright install
Getting Started with ScrapeGraph AI
Here's a basic example of how to use ScrapeGraph AI with OpenAI to scrape a webpage:
python
import json
from scrapegraphai.graphs import SmartScraperGraph
# Define the configuration for the scraping pipeline
graph_config = {
"llm": {
"api_key": "YOUR_OPENAI_APIKEY",
"model": "openai/gpt-4o-mini",
},
"verbose": True,
"headless": False,
}
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the quotes with their description",
source="https://quotes.toscrape.com/",
config=graph_config
)
# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))
Here's a basic example of how to use ScrapeGraph AI with Local LLM (Ollama) to scrape a webpage:
python
import json
from scrapegraphai.graphs import SmartScraperGraph
# Define the configuration for the scraping pipeline
graph_config = {
"llm": {
"model": "ollama/llama3.1",
"temperature": 0,
"format": "json", # Ollama needs the format to be specified explicitly
# "base_url": "http://localhost:11434", # set ollama URL arbitrarily
},
"verbose": True,
"headless": False
}
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="List me all the quotes with their description",
source="https://quotes.toscrape.com/",
config=graph_config
)
# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))
Handling Captchas with CapSolver and ScrapeGraph AI
In this section, we'll explore how to integrate Capsolver with ScrapeGraph AI to bypass captchas. CapSolver is an external service that helps in solving various types of captchas, including ReCaptcha V2, which is commonly used on websites.
We will demonstrate solving ReCaptcha V2 using Capsolver and then scraping the content of a page that requires solving the captcha first.
Bonus Code
Claim Your Bonus Code for top captcha solutions; CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Example: Solving ReCaptcha V2 with Capsolver and ScrapeGraph AI
python
import capsolver
import os
import json
from scrapegraphai.graphs import SmartScraperGraph
# Consider using environment variables for sensitive information
PROXY = os.getenv("PROXY", "http://username:password@host:port")
capsolver.api_key = os.getenv("CAPSOLVER_API_KEY", "Your Capsolver API Key")
PAGE_URL = os.getenv("PAGE_URL", "PAGE_URL")
PAGE_KEY = os.getenv("PAGE_SITE_KEY", "PAGE_SITE_KEY")
def solve_recaptcha_v2(url, key):
solution = capsolver.solve({
"type": "ReCaptchaV2Task",
"websiteURL": url,
"websiteKey": key,
"proxy": PROXY
})
return solution['solution']['gRecaptchaResponse']
def main():
print("Solving reCaptcha v2")
solution = solve_recaptcha_v2(PAGE_URL, PAGE_KEY)
print("Solution: ", solution)
# Define the configuration for the scraping pipeline
graph_config = {
"llm": {
"api_key": "YOUR_OPENAI_APIKEY",
"model": "openai/gpt-4o-mini",
},
"verbose": True,
"headless": False,
}
# Create the SmartScraperGraph instance
smart_scraper_graph = SmartScraperGraph(
prompt="Find the description of each quote.",
source="https://quotes.toscrape.com/",
config=graph_config
)
# Run the pipeline
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))
Conclusion
With ScrapeGraph AI, you can efficiently scrape websites while handling the complexities of proxies and captchas. Combining it with Capsolver allows you to bypass ReCaptcha V2 challenges seamlessly, enabling access to content that would otherwise be difficult to scrape.
Feel free to extend this script to suit your scraping needs and experiment with additional features offered by ScrapeGraph AI. Always ensure that your scraping activities respect website terms of service and legal guidelines.
Happy scraping!
Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.
More

How to Automatically Solve CAPTCHAs with NanoClaw and CapSolver
Step-by-step guide to use CapSolver with NanoClaw for automatically solving reCAPTCHA, Turnstile, AWS WAF, and other CAPTCHAs. Works with Claude AI agents, zero code, and multiple browsers.

Ethan Collins
20-Mar-2026

How to Solve CAPTCHA Challenges for AI Agents: Data Extraction with n8n, CapSolver, and OpenClaw
Learn how to automate CAPTCHA solving for AI agents using n8n, CapSolver, and OpenClaw. Build a server-side pipeline to extract data from protected websites without browser automation or manual steps.

Ethan Collins
20-Mar-2026

How to Solve CAPTCHA with TinyFish AgentQL โ Step-by-Step Guide Using CapSolver
Learn how to integrate CapSolver with TinyFish AgentQL to automatically solve CAPTCHAs like reCAPTCHA and Cloudflare Turnstile. Step-by-step tutorial with Python and JavaScript SDK examples for seamless AI-powered web automation.

Ethan Collins
19-Mar-2026

How to Solve CAPTCHA with Vercel Agent Browser โ Step-by-Step Guide Using CapSolver
Learn how to integrate CapSolver with Agent Browser to handle CAPTCHAs and build reliable AI automation workflows.

Ethan Collins
18-Mar-2026

How to Use CapSolver in n8n: The Complete Guide to Solving CAPTCHA in Your Workflows
Learn how to integrate CapSolver with n8n to solve CAPTCHAs and build reliable automation workflows with ease.

Lucas Mitchell
18-Mar-2026

How to Solve Cloudflare Turnstile Using CapSolver and n8n
Build a Cloudflare Turnstile solver API using CapSolver and n8n. Learn how to automate token solving, submit it to websites, and extract protected data with no coding.

Ethan Collins
18-Mar-2026

