Sep09, 2024

How to Use Requests (Python Library) for Web Scraping

Lucas Mitchell

Automation Engineer

How to Use Requests (Python Library) for Web Scraping

Web scraping allows you to extract data from websites, but websites may implement anti-scraping measures such as captchas or rate-limiting. In this guide, we’ll introduce the Requests library and provide an example of how to scrape data from a live website: Quotes to Scrape. Additionally, we'll explore how to handle reCAPTCHA v2 challenges using Requests and Capsolver.

What is Requests?

Requests is a simple and powerful Python library used to make HTTP requests. It's widely used for tasks like interacting with APIs, downloading web pages, and scraping data. With its user-friendly API, it's easy to send requests, handle sessions, and deal with HTTP headers and cookies.

Key Features:

Simple API for sending requests
Support for sessions and cookies
Automatic handling of redirects and proxies
Custom headers for simulating browser requests

Prerequisites

Install the Requests library using pip:

bash Copy

pip install requests

Example: Scraping Quotes to Scrape

Let’s start with a basic web scraping example where we’ll extract quotes from the Quotes to Scrape website using Requests.

python Copy

import requests
from bs4 import BeautifulSoup

# URL of the page to scrape
url = 'http://quotes.toscrape.com/'

# Send a GET request to the page
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the page content using BeautifulSoup
    soup = BeautifulSoup(response.text, 'html.parser')

    # Find all the quotes on the page
    quotes = soup.find_all('span', class_='text')

    # Print each quote
    for quote in quotes:
        print(quote.text)
else:
    print(f"Failed to retrieve the page. Status Code: {response.status_code}")

Explanation:

We send a GET request to the Quotes to Scrape website.
We use BeautifulSoup to parse the HTML content.
We extract and print all the quotes found on the page.

How to Solve reCAPTCHA v2 with Requests

Some websites, however, may employ reCAPTCHA to prevent scraping. In this case, solving reCAPTCHA is necessary before accessing content. Using Capsolver alongside Requests, we can automate the captcha-solving process.

Prerequisites

Install the Capsolver library:

bash Copy

pip install capsolver requests

Example: Solving reCAPTCHA v2

Below is a sample script that solves reCAPTCHA v2 challenges using Capsolver and sends a request with the solved captcha token:

python Copy

import capsolver
import requests

# Consider using environment variables for sensitive information
PROXY = "http://username:password@host:port"
capsolver.api_key = "Your Capsolver API Key"
PAGE_URL = "https://example.com"
PAGE_KEY = "Your-Site-Key"

def solve_recaptcha_v2(url, key):
    solution = capsolver.solve({
        "type": "ReCaptchaV2Task",
        "websiteURL": url,
        "websiteKey": key,
        "proxy": PROXY
    })
    return solution['solution']['gRecaptchaResponse']

def main():
    print("Solving reCaptcha v2")
    solution = solve_recaptcha_v2(PAGE_URL, PAGE_KEY)
    print("Solution: ", solution)

    # Headers to simulate browser
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }

    # Data payload with the captcha solution
    data = {
        'g-recaptcha-response': solution
    }

    # Send GET request to the target page with the captcha solution
    response = requests.get(PAGE_URL, headers=headers, data=data, proxies={"http": PROXY, "https": PROXY})

    # Check the response status and print the content if successful
    if response.status_code == 200:
        print("Successfully bypassed captcha and fetched the page!")
        print(response.text)
    else:
        print(f"Failed to fetch the page. Status Code: {response.status_code}")

if __name__ == "__main__":
    main()

Explanation:

Capsolver API: The solve_recaptcha_v2 function sends the site’s key and URL to Capsolver, along with proxy information, to obtain a solved captcha token.
Sending the request: Once the captcha is solved, the g-recaptcha-response is included in the request data payload and sent with custom headers to the target URL.
Simulating browser requests: We use a custom User-Agent header to avoid detection as a bot.

Web Scraping Best Practices

When web scraping, it is essential to be ethical and follow best practices:

Respect robots.txt: Always check the website's robots.txt to ensure scraping is permitted.
Rate Limiting: Introduce delays between requests to avoid overwhelming the website and reduce the risk of getting blocked.
Use Proxies: Rotate proxies to prevent IP blocks, especially when scraping at scale.
Spoof Headers: Simulate browser behavior by using custom headers like User-Agent.
Use TLS
Headers matching your chrome version
Headers matching the order of the chrome version

Conclusion

The Requests library offers an easy and efficient way to scrape websites, while handling advanced scenarios such as reCAPTCHA can be achieved with Capsolver. Always ensure your scraping activities comply with the website’s terms of service and legal guidelines.

Happy scraping!

Web ScrapingJul 22, 2026

Technical SEO Regression Monitoring: Automation Pipeline

Build technical SEO regression monitoring with versioned baselines, semantic diffs, verified alerts, and an optional authorized CAPTCHA recovery step.

Ethan Collins

CloudflareJul 22, 2026

MCP CAPTCHA Solver: Cloudflare Turnstile Integration Guide

Build a policy-gated MCP Cloudflare Turnstile workflow with CapSolver, bounded retries, redacted logs, session checks, and outcome validation.

How to Use Requests (Python Library) for Web Scraping

How to Use Requests (Python Library) for Web Scraping

What is Requests?

Key Features:

Prerequisites

Example: Scraping Quotes to Scrape

Explanation:

How to Solve reCAPTCHA v2 with Requests

Prerequisites

Example: Solving reCAPTCHA v2

Explanation:

Web Scraping Best Practices

Conclusion

More

Technical SEO Regression Monitoring: Automation Pipeline

MCP CAPTCHA Solver: Cloudflare Turnstile Integration Guide

How to Use Requests (Python Library) for Web Scraping

How to Use Requests (Python Library) for Web Scraping

What is Requests?

Key Features:

Prerequisites

Example: Scraping Quotes to Scrape

Explanation:

How to Solve reCAPTCHA v2 with Requests

Prerequisites

Example: Solving reCAPTCHA v2

Explanation:

Web Scraping Best Practices

Conclusion

More

Technical SEO Regression Monitoring: Automation Pipeline

MCP CAPTCHA Solver: Cloudflare Turnstile Integration Guide

Cloudflare Turnstile Solver for Automation: CapSolver Token Workflow, Session Checks, and Error Handling

LangChain CAPTCHA Solver Agent Tool: Build a CapSolver Recovery Workflow for reCAPTCHA and Turnstile