CAPSOLVER
Blog
How to Use aiohttp for Web Scraping

How to Use aiohttp for Web Scraping

Logo of CapSolver

Lucas Mitchell

Automation Engineer

23-Sep-2024

What is aiohttp?

aiohttp is a powerful asynchronous HTTP client/server framework for Python. It leverages Python's asyncio library to enable concurrent network operations, making it highly efficient for tasks like web scraping, web development, and any network-bound operations.

Features:

  • Asynchronous I/O: Built on top of asyncio for non-blocking network operations.
  • Client and Server Support: Provides both HTTP client and server implementations.
  • WebSockets Support: Native support for WebSockets protocols.
  • High Performance: Efficient handling of multiple connections simultaneously.
  • Extensibility: Supports middlewares, signals, and plugins for advanced customization.

Prerequisites

Before you start using aiohttp, ensure you have:

Getting Started with aiohttp

Installation

Install aiohttp using pip:

bash Copy
pip install aiohttp

Basic Example: Making a GET Request

Here's how to perform a simple GET request using aiohttp:

python Copy
import asyncio
import aiohttp

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            status = response.status
            text = await response.text()
            print(f'Status Code: {status}')
            print('Response Body:', text)

if __name__ == '__main__':
    asyncio.run(fetch('https://httpbin.org/get'))

Web Scraping Example: Scraping Quotes from a Website

Let's scrape the Quotes to Scrape website to extract quotes and their authors:

python Copy
import asyncio
import aiohttp
from bs4 import BeautifulSoup

async def fetch_content(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            return await response.text()

async def scrape_quotes():
    url = 'http://quotes.toscrape.com/'
    html = await fetch_content(url)
    soup = BeautifulSoup(html, 'html.parser')
    quotes = soup.find_all('div', class_='quote')
    for quote in quotes:
        text = quote.find('span', class_='text').get_text(strip=True)
        author = quote.find('small', class_='author').get_text(strip=True)
        print(f'{text} โ€” {author}')

if __name__ == '__main__':
    asyncio.run(scrape_quotes())

Output:

Copy
โ€œThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.โ€ โ€” Albert Einstein
โ€œIt is our choices, Harry, that show what we truly are, far more than our abilities.โ€ โ€” J.K. Rowling
... (additional quotes)

Handling Captchas with CapSolver and aiohttp

In this section, we'll explore how to integrate CapSolver with aiohttp to bypass captchas. CapSolver is an external service that helps in solving various types of captchas, including ReCaptcha v2, v3

We'll demonstrate solving ReCaptcha V2 using CapSolver and then accessing a page that requires captcha solving.

Example: Solving ReCaptcha V2 with CapSolver and aiohttp

First, install the CapSolver package:

bash Copy
pip install capsolver

Now, here's how you can solve a ReCaptcha V2 and use the solution in your request:

python Copy
import asyncio
import os
import aiohttp
import capsolver

# Set your CapSolver API key
capsolver.api_key = os.getenv("CAPSOLVER_API_KEY", "Your CapSolver API Key")
PAGE_URL = os.getenv("PAGE_URL", "https://example.com")  # Page URL with captcha
SITE_KEY = os.getenv("SITE_KEY", "SITE_KEY")             # Captcha site key

async def solve_recaptcha_v2():
    solution = capsolver.solve({
        "type": "ReCaptchaV2TaskProxyless",
        "websiteURL": PAGE_URL,
        "websiteKey": SITE_KEY
    })
    return solution['solution']['gRecaptchaResponse']

async def access_protected_page():
    captcha_response = await solve_recaptcha_v2()
    print("Captcha Solved!")

    async with aiohttp.ClientSession() as session:
        data = {
            'g-recaptcha-response': captcha_response,
            # Include other form data if required by the website
        }
        async with session.post(PAGE_URL, data=data) as response:
            content = await response.text()
            print('Page Content:', content)

if __name__ == '__main__':
    asyncio.run(access_protected_page())

Note: Replace PAGE_URL with the URL of the page containing the captcha and SITE_KEY with the site key of the captcha. The site key is usually found in the page's HTML source code within the captcha widget.

Handling Proxies with aiohttp

To route your requests through a proxy, specify the proxy parameter:

python Copy
import asyncio
import aiohttp

async def fetch(url, proxy):
    async with aiohttp.ClientSession() as session:
        async with session.get(url, proxy=proxy) as response:
            return await response.text()

async def main():
    proxy = 'http://username:password@proxyserver:port'
    url = 'https://httpbin.org/ip'
    content = await fetch(url, proxy)
    print('Response Body:', content)

if __name__ == '__main__':
    asyncio.run(main())

Handling Cookies with aiohttp

You can manage cookies using CookieJar:

python Copy
import asyncio
import aiohttp

async def main():
    jar = aiohttp.CookieJar()
    async with aiohttp.ClientSession(cookie_jar=jar) as session:
        await session.get('https://httpbin.org/cookies/set?name=value')
        # Display the cookies
        for cookie in jar:
            print(f'{cookie.key}: {cookie.value}')

if __name__ == '__main__':
    asyncio.run(main())

Advanced Usage: Custom Headers and POST Requests

You can send custom headers and perform POST requests with aiohttp:

python Copy
import asyncio
import aiohttp

async def main():
    headers = {
        'User-Agent': 'Mozilla/5.0 (compatible)',
        'Accept-Language': 'en-US,en;q=0.5',
    }
    data = {
        'username': 'testuser',
        'password': 'testpass',
    }
    async with aiohttp.ClientSession() as session:
        async with session.post('https://httpbin.org/post', headers=headers, data=data) as response:
            json_response = await response.json()
            print('Response JSON:', json_response)

if __name__ == '__main__':
    asyncio.run(main())

Bonus Code

Claim your Bonus Code for top captcha solutions at CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, unlimited times.

Conclusion

With aiohttp, you can efficiently perform asynchronous web scraping tasks and handle multiple network operations concurrently. Integrating it with CapSolver allows you to solve captchas like ReCaptcha V2, enabling access to content that might otherwise be restricted.

Feel free to expand upon these examples to suit your specific needs. Always remember to respect the terms of service of the websites you scrape and adhere to legal guidelines.

Happy scraping!

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More

CAPTCHA AI Powered by Large Models
CAPTCHA AI Powered by Large Models: Why It's More Suitable for Enterprise Scenarios

How AI visual models are reshaping CAPTCHA recognition and why enterprise-grade solvers need data, scale, and custom training.

AI
Logo of CapSolver

Ethan Collins

13-Mar-2026

 Solve AWS WAF in n8n with CapSolver
How to Solve AWS WAF in n8n with CapSolver

Automatically solve AWS WAF invisible CAPTCHAs in your n8n workflows using CapSolver AI โ€” build enterprise-grade scrapers, login automation, and reusable solver APIs without writing a single line of code.

web scraping
Logo of CapSolver

Lucas Mitchell

13-Mar-2026

WebMCP vs MCP: Whatโ€™s the Difference for AI Agents?
WebMCP vs MCP: Whatโ€™s the Difference for AI Agents?

Explore the key differences between WebMCP and MCP for AI agents, understanding their roles in web automation and structured data interaction. Learn how these protocols shape the future of AI agent capabilities.

AI
Logo of CapSolver

Emma Foster

12-Mar-2026

How to Solve Cloudflare Challenge in n8n with CapSolver
How to Solve Cloudflare Challenge in n8n with CapSolver

Build a working Cloudflare Challenge scraper in n8n using CapSolver and a Chromeโ€‘TLS Go server to bypass bot protection.

web scraping
Logo of CapSolver

Ethan Collins

12-Mar-2026

OpenClaw vs. Nanobot
OpenClaw vs. Nanobot: Choosing Your AI Agent for Automation

Compare OpenClaw and Nanobot, two leading AI agent frameworks, for efficient automation. Discover their features, performance, and how CapSolver enhances their capabilities.

AI
Logo of CapSolver

Nikolai Smirnov

11-Mar-2026

Solve reCAPTCHA v2/v3 Using CapSolver and n8n
How to Solve reCAPTCHA v2/v3 Using CapSolver and n8n

Build a eCAPTCHA v2/v3 solver API using CapSolver and n8n. Learn how to automate token solving, submit it to websites, and extract protected data with no coding.

web scraping
Logo of CapSolver

Lucas Mitchell

10-Mar-2026