CAPSOLVER
Blog
# How to Solve CAPTCHA with Selenium and Node.js when Scraping

How to Solve CAPTCHA with Selenium and Node.js when Scraping

Logo of CapSolver

Lucas Mitchell

Automation Engineer

15-Oct-2024

Speaking as a scraper project, I will say a situation like this I've faced before. You're deep into a web scraping project, everything is going well, and then ‘bang’, a flood of CAPTCHAs pops up disrupting your entire process. You've got Selenium and Node.js set up, your scraper is running perfectly, and the CAPTCHA brings everything to a screeching halt. I know that feeling all too well. Don't worry though, there are ways around this, and today, I'm going to show you how to use Selenium and Node.js to solve these delayed CAPTCHAs so you can get your scraper project moving forward without missing a beat.

Why Do Websites Use CAPTCHAs?

Before getting into solutions, it’s important to understand why CAPTCHAs exist. Websites use CAPTCHAs to distinguish between human users and automated bots. CAPTCHAs can be triggered when suspicious behavior is detected, such as multiple requests from the same IP or other signs of automation.

These mechanisms help protect websites from spam, bot traffic, and malicious activity. While this is good for website owners, it’s a significant hurdle for web scrapers who need to access and gather data legally

Struggling with the repeated failure to completely solve the irritating captcha?

Discover seamless automatic captcha solving with CapSolver AI-powered Auto Web Unblock technology!

Claim Your Bonus Code for top captcha solutions; CapSolver: WEBS. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Why Use Node.js?

Before diving into the technicalities of solving reCAPTCHA, it's important to understand why Node.js is an excellent choice for this task:

  1. Asynchronous Nature: Node.js's non-blocking, event-driven architecture makes it ideal for handling I/O-heavy operations like web scraping and API requests. This means you can perform multiple tasks simultaneously without waiting for each task to complete sequentially.
  2. Rich Ecosystem: Node.js has a vast ecosystem of libraries and modules available through npm (Node Package Manager). These libraries simplify various aspects of web scraping and automation, such as handling HTTP requests, browser automation, and CAPTCHA solving.
  3. JavaScript Everywhere: Using Node.js allows you to use JavaScript on both the client and server sides. This unification can simplify your codebase and make it easier to share logic and data between different parts of your application.
  4. Performance: Node.js is built on the V8 JavaScript engine, known for its high performance and efficient handling of asynchronous operations. This ensures that your scraping tasks are performed quickly and efficiently.

Can Selenium with Node.js Solve CAPTCHA?

From my experience, you can definitely configure Selenium with Node.js to solve CAPTCHA challenges. But, depending on how the website is set up, you’ve got two approaches to consider.

On some websites, CAPTCHAs only pop up if their anti-bot system suspects unusual activity—like automated browser behavior. In these cases, you can solve the CAPTCHA entirely by mimicking natural user actions, avoiding detection from the anti-bot system and sailing right through without ever facing a CAPTCHA.

However, some websites will have the CAPTCHA built right into the page and display it to every visitor regardless of the bot detection results. In this case, you will need to resolve the CAPTCHA issue in order to access the content. That's why most scrapers turn to third-party CAPTCHA resolution services, which are by far the most mainstream and effective way to resolve CAPTCHA issues, but some third parties use manual labour, which is slow and expensive, so it's not recommended. Instead, we recommend some companies in the market that use AI-powered Auto Web Unblock technology, which we will introduce in detail below.

Below we will also introduce some methods that can prevent the appearance of captcha, but also how you can be large-scale through the third-party economy of fast and accurate solutions, please follow me to continue to explore the next

Method #1: Using Undetected ChromeDriver with Selenium and Node.js

Let me start by sharing a free method I’ve found effective: using Undetected ChromeDriver with Selenium.

To understand why this approach works, it's important to first take a look at how standard Selenium operates. Essentially, Selenium uses ChromeDriver—a small executable that controls Chromium browsers. This executable acts as the middleman between the Selenium WebDriver and the browser itself.

Now, here’s the problem I ran into: the regular ChromeDriver leaks quite a bit of information about the automation to the target site. When a website has anti-bot measures in place, using the standard ChromeDriver often leads to being flagged. You might find yourself up against an impossible challenge like Cloudflare Turnstile CAPTCHA.

That’s where Undetected ChromeDriver came in handy for me. It’s a modified version of the regular ChromeDriver, built to avoid detection. By using techniques like fingerprint spoofing and hiding the typical automation signals, this tool makes Selenium seem much more human. I’ve noticed that it can often solve CAPTCHAs by mimicking normal user behavior.

However, it’s not foolproof. While Undetected ChromeDriver has worked for me on sites with basic bot protection, it’s not always successful. Sites with more advanced systems can still catch on, leaving this method ineffective.

If you're interested in setting it up yourself, I recommend checking out a detailed guide on using Undetected ChromeDriver with Node.js. Just keep in mind, for more heavily guarded websites, this solution might not always be enough

Method #2: Using Third-Party CAPTCHA-Solving Services

While Undetected ChromeDriver can sometimes help solve CAPTCHA challenges by mimicking natural behavior, it’s not always reliable. Many websites deploy more advanced anti-bot protections that can still detect automation tools, regardless of how human-like they appear. This is where using a third-party CAPTCHA-solving service becomes the most practical solution, especially when dealing with large-scale web scraping operations.

Why Choose Third-Party CAPTCHA Solvers?

There are several reasons why third-party services are generally the preferred approach when handling CAPTCHAs during web scraping:

  1. Accuracy and Reliability: Automated CAPTCHA-solving services leverage advanced machine learning algorithms to solve CAPTCHAs with a high success rate. These solutions are specifically designed to solve different types of CAPTCHA challenges efficiently, including complex ones like Google reCAPTCHA and Cloudflare's Turnstile.

  2. Scalability: For large-scale scraping projects, relying solely on tools like Undetected ChromeDriver can be both unreliable and time-consuming. Third-party services, on the other hand, are built to handle large volumes of CAPTCHA challenges with minimal downtime, allowing your scraping tasks to run smoothly without interruptions.

  3. Cost-Effectiveness: While you might think that using a paid service adds to your costs, consider the potential time and resource savings. Solving CAPTCHAs manually or repeatedly troubleshooting automation errors can take up valuable time, especially in high-volume scraping projects. By automating this aspect, you can focus on the actual data collection rather than CAPTCHA-solving logistics.

  4. Consistency Across Multiple Websites: The variety of CAPTCHA challenges (such as reCAPTCHA, hCaptcha, Cloudflare) deployed across different websites can make it difficult for DIY solutions to keep up. Third-party services often support multiple CAPTCHA types, ensuring that you’re covered no matter what protection the target website uses.

Now that we’ve covered why third-party solutions are often the most effective route, let me introduce CapSolver—a leading service in the CAPTCHA-solving space.

Why CapSolver?

CapSolver stands out as a fast, reliable, and scalable third-party CAPTCHA-solving solution that supports a wide range of CAPTCHA types. Whether you're dealing with reCAPTCHA v2 or v3, hCaptcha, or even the latest Cloudflare Turnstile, CapSolver has you covered.

Here’s why I recommend CapSolver:

  • Fast Service and Technical Support
    CapSolver is committed to providing fast response and efficient service to customers. The technical team has rich experience and professional knowledge, able to quickly provide support and solutions when solving CAPTCHA recognition problems.

  • Quick Update Speed
    CapSolver has a powerful monitoring system that actively responds at the first time when services need to be updated and maintained, and continuously improves and optimizes our CAPTCHA recognition algorithms to ensure that system can efficiently respond to various updates of CAPTCHAs and continue to provide accurate recognition results.

  • Rich Service Support Types
    CapSolver is the supplier in the market that supports the most types of CAPTCHA recognition services, including reCAPTCHA (v2/v3/Enterprise), hCaptcha (Normal/Enterprise), Cloudflare, ImageToText, DataDome, GeeTest V3/V4, AWS Captcha, and more, which can handle over 95% of CAPTCHA needs worldwide, covering all mainstream CAPTCHA service types.

  • Detailed API Functions and Documentation Tutorials
    CapSolver provides comprehensive API functions, making it easy for developers to integrate our CAPTCHA recognition services. The documentation tutorials not only cover the basic use of the API but also include advanced configuration and common problem-solving solutions, helping you efficiently apply CapSolver’s technology in your projects.

  • Extension Services
    In addition to providing API services, CapSolver also provides extensions that are convenient for users who don’t know programming. This provides a more convenient way for non-technical personnel to deal with CAPTCHA challenges. The browser extension supports recognizing the most popular CAPTCHAs

How to Integrate CapSolver with Selenium and Node.js

Integrating CapSolver into your Selenium and Node.js project is straightforward.So from myself process, here's a step-by-step suggestion:

  1. Install the CapSolver SDK: First, install the CapSolver Node.js SDK by running the following command in your project directory:

    npm install capsolver-node
  2. Set Up API Key: Once you’ve installed the SDK, you’ll need an API key from CapSolver. Head to the CapSolver website and create an account to get your key.

  3. CAPTCHA Handling in Your Code: Here's how I implemented CapSolver in my project to solve CAPTCHA challenges:

// npm install axios
const axios = require('axios');
 
const api_key = "YOUR_API_KEY";  // Replace with your actual API key
const site_key = "0x4XXXXXXXXXXXXXXXXX";  // Replace with the site key
const site_url = "https://www.yourwebsite.com";  // Replace with the target site URL
 
async function capsolver() {
  const payload = {
    clientKey: api_key,
    task: {
      type: 'AntiTurnstileTaskProxyLess',
      websiteKey: site_key,
      websiteURL: site_url,
      metadata: {
          action: ''  // Optional action metadata
      }
    }
  };
 
  try {
    const res = await axios.post("https://api.capsolver.com/createTask", payload);
    const task_id = res.data.taskId;
    if (!task_id) {
      console.log("Failed to create task:", res.data);
      return;
    }
    console.log("Got taskId:", task_id);
 
    while (true) {
      await new Promise(resolve => setTimeout(resolve, 1000)); // Delay for 1 second
 
      const getResultPayload = {clientKey: api_key, taskId: task_id};
      const resp = await axios.post("https://api.capsolver.com/getTaskResult", getResultPayload);
      const status = resp.data.status;
 
      if (status === "ready") {
        return resp.data.solution.token; // Return the solved token
      }
      if (status === "failed" || resp.data.errorId) {
        console.log("Solve failed! response:", resp.data);
        return;
      }
    }
  } catch (error) {
    console.error("Error:", error);
  }
}
 
capsolver().then(token => {
  console.log(token);  // Output the solved CAPTCHA token
});
  1. Integrate CAPTCHA Solution into Selenium: After receiving the CAPTCHA solution, you can inject it into the browser using Selenium WebDriver to submit the form and solve the CAPTCHA.

  2. Run Your Scraper: With CapSolver integrated into your Selenium script, you’re ready to run your scraper without worrying about CAPTCHA interruptions.

By integrating CapSolver into your scraping project, you’ll solve CAPTCHA challenges effortlessly and ensure that your automation runs smoothly and efficiently.

Conclusion

Handling CAPTCHAs while web scraping is one of the biggest challenges I've faced, but with the right tools, I’ve learned how to overcome these obstacles. Whether I opt for Undetected ChromeDriver or choose a more robust solution, I can ensure that my web scraping efforts continue without interruptions.

For anyone scraping on a larger scale, I believe relying on a CAPTCHA solving service is a smart investment. It’s fast, efficient, and built for scalability—allowing my scraper to focus on gathering data instead of getting stuck on CAPTCHAs.

Ohh, if you’re ready to take the plunge and experience the benefits of CapSolver for yourself, sign up here. You’ll be solving CAPTCHAs in no time!

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More

How to Solve CAPTCHA with Selenium and Node.js when Scraping
How to Solve CAPTCHA with Selenium and Node.js when Scraping

If you’re facing continuous CAPTCHA issues in your scraping efforts, consider using some tools and their advanced technology to ensure you have a reliable solution

The other captcha
Logo of CapSolver

Lucas Mitchell

15-Oct-2024

Solving 403 Forbidden Errors When Crawling Websites with Python
Solving 403 Forbidden Errors When Crawling Websites with Python

Learn how to overcome 403 Forbidden errors when crawling websites with Python. This guide covers IP rotation, user-agent spoofing, request throttling, authentication handling, and using headless browsers to bypass access restrictions and continue web scraping successfully.

The other captcha
Logo of CapSolver

Sora Fujimoto

01-Aug-2024

How to Use Selenium Driverless for Efficient Web Scraping
How to Use Selenium Driverless for Efficient Web Scraping

Learn how to use Selenium Driverless for efficient web scraping. This guide provides step-by-step instructions on setting up your environment, writing your first Selenium Driverless script, and handling dynamic content. Streamline your web scraping tasks by avoiding the complexities of traditional WebDriver management, making your data extraction process simpler, faster, and more portable.

The other captcha
Logo of CapSolver

Lucas Mitchell

01-Aug-2024

Scrapy vs. Selenium
Scrapy vs. Selenium: What's Best for Your Web Scraping Project

Discover the strengths and differences between Scrapy and Selenium for web scraping. Learn which tool suits your project best and how to handle challenges like CAPTCHAs.

The other captcha
Logo of CapSolver

Ethan Collins

24-Jul-2024

API vs Scraping
API vs Scraping : the best way to obtain the data

Understand the differences, pros, and cons of Web Scraping and API Scraping to choose the best data collection method. Explore CapSolver for bot challenge solutions.

The other captcha
Logo of CapSolver

Ethan Collins

15-Jul-2024

How to solve CAPTCHA With Selenium C#
How to solve CAPTCHA With Selenium C#

At the end of this tutorial, you'll have a solid understanding of How to solve CAPTCHA With Selenium C#

The other captcha
Logo of CapSolver

Rajinder Singh

10-Jul-2024