How to Use Requests (Python Library) for Web Scraping

Lucas Mitchell
Automation Engineer
04-Sep-2024
How to Use Requests (Python Library) for Web Scraping

Web scraping allows you to extract data from websites, but websites may implement anti-scraping measures such as captchas or rate-limiting. In this guide, weโll introduce the Requests library and provide an example of how to scrape data from a live website: Quotes to Scrape. Additionally, we'll explore how to handle reCAPTCHA v2 challenges using Requests and Capsolver.
What is Requests?
Requests is a simple and powerful Python library used to make HTTP requests. It's widely used for tasks like interacting with APIs, downloading web pages, and scraping data. With its user-friendly API, it's easy to send requests, handle sessions, and deal with HTTP headers and cookies.
Key Features:
- Simple API for sending requests
- Support for sessions and cookies
- Automatic handling of redirects and proxies
- Custom headers for simulating browser requests
Prerequisites
Install the Requests library using pip:
bash
pip install requests
Example: Scraping Quotes to Scrape
Letโs start with a basic web scraping example where weโll extract quotes from the Quotes to Scrape website using Requests.
python
import requests
from bs4 import BeautifulSoup
# URL of the page to scrape
url = 'http://quotes.toscrape.com/'
# Send a GET request to the page
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Parse the page content using BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
# Find all the quotes on the page
quotes = soup.find_all('span', class_='text')
# Print each quote
for quote in quotes:
print(quote.text)
else:
print(f"Failed to retrieve the page. Status Code: {response.status_code}")
Explanation:
- We send a GET request to the Quotes to Scrape website.
- We use BeautifulSoup to parse the HTML content.
- We extract and print all the quotes found on the page.
How to Solve reCAPTCHA v2 with Requests
Some websites, however, may employ reCAPTCHA to prevent scraping. In this case, solving reCAPTCHA is necessary before accessing content. Using Capsolver alongside Requests, we can automate the captcha-solving process.
Prerequisites
Install the Capsolver library:
bash
pip install capsolver requests
Example: Solving reCAPTCHA v2
Below is a sample script that solves reCAPTCHA v2 challenges using Capsolver and sends a request with the solved captcha token:
python
import capsolver
import requests
# Consider using environment variables for sensitive information
PROXY = "http://username:password@host:port"
capsolver.api_key = "Your Capsolver API Key"
PAGE_URL = "https://example.com"
PAGE_KEY = "Your-Site-Key"
def solve_recaptcha_v2(url, key):
solution = capsolver.solve({
"type": "ReCaptchaV2Task",
"websiteURL": url,
"websiteKey": key,
"proxy": PROXY
})
return solution['solution']['gRecaptchaResponse']
def main():
print("Solving reCaptcha v2")
solution = solve_recaptcha_v2(PAGE_URL, PAGE_KEY)
print("Solution: ", solution)
# Headers to simulate browser
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
# Data payload with the captcha solution
data = {
'g-recaptcha-response': solution
}
# Send GET request to the target page with the captcha solution
response = requests.get(PAGE_URL, headers=headers, data=data, proxies={"http": PROXY, "https": PROXY})
# Check the response status and print the content if successful
if response.status_code == 200:
print("Successfully bypassed captcha and fetched the page!")
print(response.text)
else:
print(f"Failed to fetch the page. Status Code: {response.status_code}")
if __name__ == "__main__":
main()
Explanation:
- Capsolver API: The
solve_recaptcha_v2function sends the siteโs key and URL to Capsolver, along with proxy information, to obtain a solved captcha token. - Sending the request: Once the captcha is solved, the
g-recaptcha-responseis included in the request data payload and sent with custom headers to the target URL. - Simulating browser requests: We use a custom
User-Agentheader to avoid detection as a bot.
Web Scraping Best Practices
When web scraping, it is essential to be ethical and follow best practices:
- Respect
robots.txt: Always check the website'srobots.txtto ensure scraping is permitted. - Rate Limiting: Introduce delays between requests to avoid overwhelming the website and reduce the risk of getting blocked.
- Use Proxies: Rotate proxies to prevent IP blocks, especially when scraping at scale.
- Spoof Headers: Simulate browser behavior by using custom headers like
User-Agent. - Use TLS
- Headers matching your chrome version
- Headers matching the order of the chrome version
Conclusion
The Requests library offers an easy and efficient way to scrape websites, while handling advanced scenarios such as reCAPTCHA can be achieved with Capsolver. Always ensure your scraping activities comply with the websiteโs terms of service and legal guidelines.
Happy scraping!
Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.
More

Elevating Enterprise Automation: LLM-Powered Infrastructure for Seamless CAPTCHA Recognition & Operational Efficiency
Discover how LLM-powered AI Automation Infrastructure revolutionizes CAPTCHA recognition, enhancing business process efficiency and reducing manual intervention. Optimize your automated operations with advanced verification solutions.

Ethan Collins
30-Mar-2026

Scaling Data Collection for LLM Training: Solving CAPTCHAs at Scale
Learn how to scale data collection for LLM training by solving CAPTCHAs at scale. Discover automated strategies to build high-quality datasets for AI models.

Lucas Mitchell
27-Mar-2026

Fix Cloudflare Error 1005: Web Scraping Guide & Solutions
Learn to fix Cloudflare Error 1005 access denied during web scraping. Discover solutions like residential proxies, browser fingerprinting, and CapSolver for CAPTCHA. Optimize your data extraction.

Aloรญsio Vรญtor
27-Mar-2026

How to Solve CAPTCHA in Vibium Without Extensions (reCAPTCHA, Turnstile, AWS WAF)
Learn how to solve CAPTCHAs in Vibium using the CapSolver API. Supports reCAPTCHA v2/v3, Cloudflare Turnstile, and AWS WAF with full code examples in JavaScript, Python, and Javaโno browser extension needed.

Lucas Mitchell
26-Mar-2026

How to Solve CAPTCHA in OpenBrowser Using CapSolver (AI Agent Automation Guide)
Solve CAPTCHA in OpenBrowser using CapSolver. Automate reCAPTCHA, Turnstile, and more for AI agents easily.

Ethan Collins
26-Mar-2026

How to Solve Any CAPTCHA in HyperBrowser Using CapSolver (Full Setup Guide)
Solve any CAPTCHA in HyperBrowser using CapSolver. Automate reCAPTCHA, Turnstile, AWS WAF, and more easily.

Ethan Collins
26-Mar-2026

