CAPSOLVER
Blog
How to Use RestSharp (C# Library) for Web Scraping

How to Use RestSharp (C# Library) for Web Scraping

Logo of Capsolver

Lucas Mitchell

Automation Engineer

17-Sep-2024

Web scraping is an essential technique for extracting data from websites, but modern web applications often implement security measures like CAPTCHA challenges to prevent automated access. CAPTCHA challenges, such as Google reCAPTCHA, are designed to differentiate between human users and bots, making it challenging for automated scripts to scrape content effectively.

To overcome these obstacles, developers can leverage tools and services that simplify HTTP requests and handle CAPTCHA solving. RestSharp is a powerful and easy-to-use C# library that simplifies the process of making HTTP requests to RESTful APIs. When combined with an HTML parser like HtmlAgilityPack, it becomes a robust solution for web scraping tasks.

RestSharp WebScraping

However, encountering CAPTCHA challenges during scraping can halt your automation process. This is where Capsolver comes into play. Capsolver offers API-based solutions to solve CAPTCHAs programmatically, enabling your scraping scripts to bypass these challenges and access the desired content seamlessly.

In this comprehensive guide, we'll walk you through:

Web Scraping with RestSharp

In C#, RestSharp is a popular library for handling HTTP requests and interacting with RESTful APIs. It simplifies many aspects of HTTP communication compared to the built-in HttpClient. You can combine RestSharp with an HTML parser like HtmlAgilityPack to extract data from web pages.

Prerequisites

  • Install the RestSharp library using NuGet Package Manager:

    Install-Package RestSharp
  • Install the HtmlAgilityPack library to help parse HTML content:

    Install-Package HtmlAgilityPack
  • Install Newtonsoft.Json to handle JSON responses:

    Install-Package Newtonsoft.Json

Example: Scraping "Quotes to Scrape"

Let’s scrape quotes from the Quotes to Scrape website using RestSharp and HtmlAgilityPack.

using System;
using System.Threading.Tasks;
using HtmlAgilityPack;
using RestSharp;

class Program
{
    static async Task Main(string[] args)
    {
        string url = "http://quotes.toscrape.com/";

        // Initialize RestSharp client
        var client = new RestClient(url);

        // Create a GET request
        var request = new RestRequest(Method.GET);

        // Execute the request
        var response = await client.ExecuteAsync(request);

        if (response.IsSuccessful)
        {
            // Parse the page content using HtmlAgilityPack
            HtmlDocument htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(response.Content);

            // Find all the quotes on the page
            var quotes = htmlDoc.DocumentNode.SelectNodes("//span[@class='text']");

            // Print each quote
            foreach (var quote in quotes)
            {
                Console.WriteLine(quote.InnerText);
            }
        }
        else
        {
            Console.WriteLine($"Failed to retrieve the page. Status Code: {response.StatusCode}");
        }
    }
}

Explanation:

  • RestSharp Client and Request: Initializes a RestClient with the target URL and creates a RestRequest for the GET method.
  • Executing the Request: Sends the request asynchronously and checks if the response is successful.
  • HtmlAgilityPack: Parses the HTML content from the response and extracts quotes by selecting elements with the class text.

Solving reCAPTCHA v2 & reCAPTCHA v3 with Capsolver using RestSharp

When a website employs reCAPTCHA v2 or v3 for security, you can solve the CAPTCHA using the Capsolver API. Below is how you can integrate Capsolver with RestSharp to solve reCAPTCHA challenges.

Prerequisites

  • Newtonsoft.Json is used to handle JSON parsing from Capsolver responses:

    Install-Package Newtonsoft.Json

Example: Solving reCAPTCHA v2 with Capsolver

In this section, we will demonstrate how to solve reCAPTCHA v2 challenges using the Capsolver API and RestSharp.

using System;
using System.Threading.Tasks;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using RestSharp;

class Program
{
    private static readonly string apiUrl = "https://api.capsolver.com";
    private static readonly string clientKey = "YOUR_API_KEY"; // Replace with your Capsolver API Key

    static async Task Main(string[] args)
    {
        try
        {
            // Step 1: Create a task for solving reCAPTCHA v2
            string taskId = await CreateTask();
            Console.WriteLine("Task ID: " + taskId);

            // Step 2: Retrieve the result of the task
            string taskResult = await GetTaskResult(taskId);
            Console.WriteLine("Task Result (CAPTCHA Token): " + taskResult);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Error: " + ex.Message);
        }
    }

    // Method to create a new CAPTCHA-solving task
    private static async Task<string> CreateTask()
    {
        // Initialize RestSharp client
        var client = new RestClient(apiUrl);

        // Request payload
        var requestBody = new
        {
            clientKey = clientKey,
            task = new
            {
                type = "ReCaptchaV2TaskProxyLess", // Task type for reCAPTCHA v2 without proxy
                websiteURL = "https://www.example.com", // The website URL to solve CAPTCHA for
                websiteKey = "SITE_KEY_HERE" // reCAPTCHA site key
            }
        };

        // Create a POST request
        var request = new RestRequest("createTask", Method.POST);
        request.AddJsonBody(requestBody);

        // Execute the request
        var response = await client.ExecuteAsync(request);

        if (!response.IsSuccessful)
        {
            throw new Exception("Failed to create task: " + response.Content);
        }

        JObject jsonResponse = JObject.Parse(response.Content);
        if (jsonResponse["errorId"].ToString() != "0")
        {
            throw new Exception("Error creating task: " + jsonResponse["errorDescription"]);
        }

        // Return the task ID to be used in the next step
        return jsonResponse["taskId"].ToString();
    }

    // Method to retrieve the result of a CAPTCHA-solving task
    private static async Task<string> GetTaskResult(string taskId)
    {
        // Initialize RestSharp client
        var client = new RestClient(apiUrl);

        // Request payload
        var requestBody = new
        {
            clientKey = clientKey,
            taskId = taskId
        };

        // Create a POST request
        var request = new RestRequest("getTaskResult", Method.POST);
        request.AddJsonBody(requestBody);

        // Poll for the result of the task every 5 seconds
        while (true)
        {
            var response = await client.ExecuteAsync(request);

            if (!response.IsSuccessful)
            {
                throw new Exception("Failed to get task result: " + response.Content);
            }

            JObject jsonResponse = JObject.Parse(response.Content);
            if (jsonResponse["errorId"].ToString() != "0")
            {
                throw new Exception("Error getting task result: " + jsonResponse["errorDescription"]);
            }

            // If the task is ready, return the CAPTCHA token
            if (jsonResponse["status"].ToString() == "ready")
            {
                return jsonResponse["solution"]["gRecaptchaResponse"].ToString();
            }

            // Wait for 5 seconds before checking again
            Console.WriteLine("Task is still processing, waiting 5 seconds...");
            await Task.Delay(5000);
        }
    }
}

Explanation:

  1. CreateTask Method:

    • RestSharp Client and Request: Initializes a RestClient and creates a RestRequest for the createTask endpoint with the POST method.
    • Request Payload: Sets up the necessary parameters including clientKey, websiteURL, websiteKey, and specifies the task type as ReCaptchaV2TaskProxyLess.
    • Execution: Sends the request and parses the response to retrieve the taskId.
  2. GetTaskResult Method:

    • RestSharp Client and Request: Initializes a RestClient and creates a RestRequest for the getTaskResult endpoint with the POST method.
    • Polling: Continuously polls the task status every 5 seconds until it is completed (status: ready).
    • Result Retrieval: Once the task is ready, it extracts the gRecaptchaResponse, which can be used to bypass the CAPTCHA.

Example: Solving reCAPTCHA v3 with Capsolver

In this section, we will demonstrate how to solve reCAPTCHA v3 challenges using the Capsolver API and RestSharp.

using System;
using System.Threading.Tasks;
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using RestSharp;

class Program
{
    private static readonly string apiUrl = "https://api.capsolver.com";
    private static readonly string clientKey = "YOUR_API_KEY"; // Replace with your Capsolver API Key

    static async Task Main(string[] args)
    {
        try
        {
            // Step 1: Create a task for solving reCAPTCHA v3
            string taskId = await CreateTask();
            Console.WriteLine("Task ID: " + taskId);

            // Step 2: Retrieve the result of the task
            string taskResult = await GetTaskResult(taskId);
            Console.WriteLine("Task Result (CAPTCHA Token): " + taskResult);
        }
        catch (Exception ex)
        {
            Console.WriteLine("Error: " + ex.Message);
        }
    }

    // Method to create a new CAPTCHA-solving task
    private static async Task<string> CreateTask()
    {
        // Initialize RestSharp client
        var client = new RestClient(apiUrl);

        // Request payload
        var requestBody = new
        {
            clientKey = clientKey,
            task = new
            {
                type = "ReCaptchaV3TaskProxyLess", // Task type for reCAPTCHA v3 without proxy
                websiteURL = "https://www.example.com", // The website URL to solve CAPTCHA for
                websiteKey = "SITE_KEY_HERE", // reCAPTCHA site key
                minScore = 0.3, // Desired minimum score
                pageAction = "your_action" // Action name defined on the site
            }
        };

        // Create a POST request
        var request = new RestRequest("createTask", Method.POST);
        request.AddJsonBody(requestBody);

        // Execute the request
        var response = await client.ExecuteAsync(request);

        if (!response.IsSuccessful)
        {
            throw new Exception("Failed to create task: " + response.Content);
        }

        JObject jsonResponse = JObject.Parse(response.Content);
        if (jsonResponse["errorId"].ToString() != "0")
        {
            throw new Exception("Error creating task: " + jsonResponse["errorDescription"]);
        }

        // Return the task ID to be used in the next step
        return jsonResponse["taskId"].ToString();
    }

    // Method to retrieve the result of a CAPTCHA-solving task
    private static async Task<string> GetTaskResult(string taskId)
    {
        // Initialize RestSharp client
        var client = new RestClient(apiUrl);

        // Request payload
        var requestBody = new
        {
            clientKey = clientKey,
            taskId = taskId
        };

        // Create a POST request
        var request = new RestRequest("getTaskResult", Method.POST);
        request.AddJsonBody(requestBody);

        // Poll for the result of the task every 5 seconds
        while (true)
        {
            var response = await client.ExecuteAsync(request);

            if (!response.IsSuccessful)
            {
                throw new Exception("Failed to get task result: " + response.Content);
            }

            JObject jsonResponse = JObject.Parse(response.Content);
            if (jsonResponse["errorId"].ToString() != "0")
            {
                throw new Exception("Error getting task result: " + jsonResponse["errorDescription"]);
            }

            // If the task is ready, return the CAPTCHA token
            if (jsonResponse["status"].ToString() == "ready")
            {
                return jsonResponse["solution"]["gRecaptchaResponse"].ToString();
            }

            // Wait for 5 seconds before checking again
            Console.WriteLine("Task is still processing, waiting 5 seconds...");
            await Task.Delay(5000);
        }
    }
}

Bonus Code

Claim Your Bonus Code for top captcha solutions; CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Explanation:

  1. CreateTask Method:

    • RestSharp Client and Request: Sets up a RestClient and RestRequest for the createTask endpoint.
    • Request Payload: Includes additional parameters like minScore and pageAction specific to reCAPTCHA v3.
    • Execution: Sends the request and retrieves the taskId.
  2. GetTaskResult Method:

    • Similar to the v2 example, it polls the Capsolver API for the task result and retrieves the CAPTCHA token once the task is ready.

Web Scraping Best Practices in C#

When using web scraping tools in C#, always follow these best practices:

  • Respect robots.txt: Ensure that the website allows web scraping by checking the robots.txt file.
  • Rate Limiting: Avoid making too many requests in a short period to prevent getting blocked by the website.
  • Proxy Rotation: Use proxies to distribute requests across multiple IPs to avoid being flagged as a bot.
  • Spoof Headers: Simulate browser-like requests by adding custom headers, such as User-Agent, to your HTTP requests.

Conclusion

By using RestSharp for web scraping and Capsolver for CAPTCHA solving, you can effectively automate interactions with websites that employ CAPTCHA challenges. Always ensure that your web scraping activities comply with the target website's terms of service and legal requirements.

Happy scraping!

More