CAPSOLVER
Blog
How to Solve TLS/JA3 Fingerprinting in Web Scraping with curl_cffi

How to Solve TLS/JA3 Fingerprinting in Web Scraping with curl_cffi

Logo of CapSolver

Lucas Mitchell

Automation Engineer

27-Aug-2024

Struggling with anti-bot measures while scraping websites? curl_cffi, an advanced Python library wrapping around the cURL tool, can help you bypass these barriers effectively. By mimicking browser behavior and leveraging cURL's features, curl_cffi enhances your scraper’s ability to avoid detection and perform smoothly. In this guide, we’ll explore how curl_cffi works, how to use it for various tasks, and address its limitations. We’ll also discuss potential solutions to overcome these limitations.

What is curl_cffi?

curl_cffi is a Python library designed for network requests, similar to libraries like requests and httpx. However, unlike these libraries, curl_cffi can simulate browser TLS/JA3 and HTTP/2 fingerprints. curl-impersonate is a command-line tool that can simulate four major browsers and perform TLS and HTTP handshakes just like a real browser. curl_cffi wraps curl-impersonate into a Python library using cffi.

Struggling with the repeated failure to completely solve the irritating captcha?

Discover seamless automatic captcha solving with CapSolver AI-powered Auto Web Unblock technology!

Claim Your Bonus Code for top captcha solutions; CapSolver: WEBS. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

What is TLS/JA3 Fingerprinting?

Today, most websites use HTTPS. To establish an HTTPS connection, a TLS handshake occurs between the server and client, exchanging information such as supported TLS versions and encryption algorithms. Different clients have varying characteristics, and these details are often stable, allowing servers to identify requests as coming from typical user browsers or automated scripts. JA3 is a common algorithm used to generate TLS fingerprints. It works by concatenating these characteristics and computing an MD5 hash.

Using curl_cffi

The usage of curl_cffi is quite similar to requests. Here’s how you can use requests to get the JA3 fingerprint:

python Copy
import requests

url = "https://tls.browserleaks.com/json"
r = requests.get(url)
print(r.json())

You might get results like this:

json Copy
{
    "user_agent": "python-requests/2.32.3",
    "ja3_hash": "8d9f7747675e24454cd9b7ed35c58707",
    "ja3_text": "771,4866-4867-4865-49196-49200-49195-49199-52393-52392-159-158-52394-49327-49325-49326-49324-49188-49192-49187-49191-49162-49172-49161-49171-49315-49311-49314-49310-107-103-57-51-157-156-49313-49309-49312-49308-61-60-53-47-255,0-11-10-16-22-23-49-13-43-45-51-21,29-23-30-25-24,0-1-2",
    "ja3n_hash": "a790a1e311289ac1543f411f6ffceddf",
    "ja3n_text": "771,4866-4867-4865-49196-49200-49195-49199-52393-52392-159-158-52394-49327-49325-49326-49324-49188-49192-49187-49191-49162-49172-49161-49171-49315-49311-49314-49310-107-103-57-51-157-156-49313-49309-49312-49308-61-60-53-47-255,0-10-11-13-16-21-22-23-43-45-49-51,29-23-30-25-24,0-1-2",
    "akamai_hash": "",
    "akamai_text": ""
}

If you make repeated requests, you’ll find that your JA3 hash remains the same. However, starting from Chrome version 110, the TLS ClientHello extension order is randomized, making it easier for website developers to block libraries like requests based on JA3 fingerprints. If your requests consistently show the same JA3 fingerprint, they might be identified as coming from a single user, increasing the likelihood of being flagged as a bot.

Here’s how to use curl_cffi to simulate a real JA3 fingerprint:

python Copy
from curl_cffi import requests

url = "https://tls.browserleaks.com/json"
r = requests.get(url, impersonate="chrome124")
print(r.json())

The impersonate parameter allows you to specify the browser and version to simulate. Supported browsers include Chrome, Chrome Android, Edge, and Safari, with versions continually updated. For detailed information, refer to the curl_cffi GitHub repository. With curl_cffi, the JA3 fingerprint will align with that of a real browser, and starting from Chrome 110, the JA3 fingerprint will change with each request:

json Copy
{
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
    "ja3_hash": "c97c8dac4ca1de968fe230de54f3e0f3",
    "ja3_text": "771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,16-10-27-18-5-51-23-17513-45-35-43-13-65281-0-11-65037,25497-29-23-24,0",
    "ja3n_hash": "4c9ce26028c11d7544da00d3f7e4f45c",
    "ja3n_text": "771,4865-4866-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-5-10-11-13-16-18-23-27-35-43-45-51-17513-65037-65281,25497-29-23-24,0",
    "akamai_hash": "52d84b11737d980aef856699f885ca86",
    "akamai_text": "1:65536;2:0;4:6291456;6:262144|15663105|0|m,a,s,p"
}

Addressing curl_cffi Limitations

While curl_cffi can simulate real JA3 fingerprints and potentially avoid bot challenges and blocks, it may not always be sufficient. Many sites implement advanced bot protection mechanisms. These systems use complex images and difficult-to-read JavaScript challenges to differentiate between humans and bots. Sometimes, even with a genuine and randomized JA3 fingerprint, bypassing these challenges is unavoidable.

If you encounter CAPTCHA challenges, regardless of the request library you use, they may be unavoidable. However, there’s no need to worry. CapSolver provides a solution to these problems. CapSolver uses AI-based automated web unlocking technology to solve various bot challenges in seconds. Whether dealing with images or complex problems, CapSolver can handle it efficiently. If the solution fails, you won’t incur any costs.

CapSolver also offers a browser extension that automatically resolves CAPTCHAs during data scraping with Selenium. Additionally, an API solution is available for resolving CAPTCHAs and obtaining tokens in frameworks like Scrapy. Everything can be accomplished in just a few seconds. For more details, refer to the CapSolver documentation.

Conclusion

By integrating curl_cffi into your web scraping setup, you can effectively mimic real browser behavior to overcome TLS/JA3 fingerprinting challenges. While curl_cffi provides robust tools for handling these challenges, advanced CAPTCHA and bot detection systems still pose significant obstacles. CapSolver offers a complementary solution to tackle these CAPTCHA challenges seamlessly, ensuring your scraping activities run smoothly.

For further insights and resources, check out the CapSolver website and explore the curl_cffi GitHub repository.

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More

AI-powered Image Recognition: The Basics and How to Solve it
AI-powered Image Recognition: The Basics and How to Solve it

Say goodbye to image CAPTCHA struggles – CapSolver Vision Engine solves them fast, smart, and hassle-free!

Logo of CapSolver

Lucas Mitchell

24-Apr-2025

Best User Agents for Web Scraping & How to Use Them
Best User Agents for Web Scraping & How to Use Them

A guide to the best user agents for web scraping and their effective use to avoid detection. Explore the importance of user agents, types, and how to implement them for seamless and undetectable web scraping.

Logo of CapSolver

Ethan Collins

07-Mar-2025

What is a Captcha? Can Captcha Track You?
What is a Captcha? Can Captcha Track You?

Ever wondered what a CAPTCHA is and why websites make you solve them? Learn how CAPTCHAs work, whether they track you, and why they’re crucial for web security. Plus, discover how to bypass CAPTCHAs effortlessly with CapSolver for web scraping and automation.

Logo of CapSolver

Lucas Mitchell

05-Mar-2025

How to Solve Cloudflare JS Challenge for Web Scraping and Automation
How to Solve Cloudflare JS Challenge for Web Scraping and Automation

Learn how to solve Cloudflare's JavaScript Challenge for seamless web scraping and automation. Discover effective strategies, including using headless browsers, proxy rotation, and leveraging CapSolver's advanced CAPTCHA-solving capabilities.

Cloudflare
Logo of CapSolver

Rajinder Singh

05-Mar-2025

Cloudflare TLS Fingerprinting: What It Is and How to Solve It
Cloudflare TLS Fingerprinting: What It Is and How to Solve It

Learn about Cloudflare's use of TLS fingerprinting for security, how it detects and blocks bots, and explore effective methods to solve it for web scraping and automated browsing tasks.

Cloudflare
Logo of CapSolver

Lucas Mitchell

28-Feb-2025

Why do I keep getting asked to verify I'm not a robot?
Why do I keep getting asked to verify I'm not a robot?

Learn why Google prompts you to verify you're not a robot and explore solutions like using CapSolver’s API to solve CAPTCHA challenges efficiently.

Logo of CapSolver

Ethan Collins

27-Feb-2025