How to Use HttpClient (C# Library) for Web Scraping

Ethan Collins
Pattern Recognition Specialist
13-Sep-2024

CAPTCHA challenges, such as Google reCAPTCHA, are commonly used by websites to block bots and prevent automated access to their content. To bypass such challenges programmatically, you can use services like Capsolver that offer API-based solutions to solve these CAPTCHAs.
In this guide, we'll show you how to:
- Scrape websites using C# HttpClient and HtmlAgilityPack.
- Solve reCAPTCHA challenges using the Capsolver API.
Web Scraping with C# HttpClient
In C#, the HttpClient class is commonly used to send HTTP requests and receive responses from websites. You can combine this with an HTML parser like HtmlAgilityPack to extract data from web pages.
Prerequisites
- Install the HtmlAgilityPack library using NuGet Package Manager to help parse HTML content:
bash
Install-Package HtmlAgilityPack
- Install Newtonsoft.Json to handle JSON responses:
bash
Install-Package Newtonsoft.Json
Example: Scraping "Quotes to Scrape"
Let’s scrape quotes from the Quotes to Scrape website using HttpClient and HtmlAgilityPack.
csharp
using System;
using System.Net.Http;
using System.Threading.Tasks;
using HtmlAgilityPack;
class Program
{
private static readonly HttpClient client = new HttpClient();
static async Task Main(string[] args)
{
string url = "http://quotes.toscrape.com/";
// Send a GET request to the page
HttpResponseMessage response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
// Parse the page content using HtmlAgilityPack
string pageContent = await response.Content.ReadAsStringAsync();
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(pageContent);
// Find all the quotes on the page
var quotes = htmlDoc.DocumentNode.SelectNodes("//span[@class='text']");
// Print each quote
foreach (var quote in quotes)
{
Console.WriteLine(quote.InnerText);
}
}
else
{
Console.WriteLine($"Failed to retrieve the page. Status Code: {response.StatusCode}");
}
}
}
Explanation:
- HttpClient: Sends a GET request to the website.
- HtmlAgilityPack: Parses the HTML content and extracts quotes by selecting elements with the class
text
.
Solving reCAPTCHA v3 & reCaptcha v2 with Capsolver using HttpClient
When a website employs reCAPTCHA v3 & reCaptcha v2 for security, you can solve the CAPTCHA using the Capsolver API. Below is how you can integrate Capsolver with HttpClient to solve reCAPTCHA challenges.
Prerequisites
- Newtonsoft.Json is used to handle JSON parsing from Capsolver responses:
bash
Install-Package Newtonsoft.Json
Example: Solving reCAPTCHA v2 with Capsolver
In this section, we will demonstrate how to solve reCAPTCHA v2 challenges using the Capsolver API and HttpClient.
csharp
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
class Program
{
private static readonly string apiUrl = "https://api.capsolver.com";
private static readonly string clientKey = "YOUR_API_KEY"; // Replace with your Capsolver API Key
static async Task Main(string[] args)
{
try
{
// Step 1: Create a task for solving reCAPTCHA v3
string taskId = await CreateTask();
Console.WriteLine("Task ID: " + taskId);
// Step 2: Retrieve the result of the task
string taskResult = await GetTaskResult(taskId);
Console.WriteLine("Task Result (CAPTCHA Token): " + taskResult);
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
}
// Method to create a new CAPTCHA-solving task
private static async Task<string> CreateTask()
{
using (HttpClient client = new HttpClient())
{
// Request payload
var requestBody = new
{
clientKey = clientKey,
task = new
{
type = "ReCaptchaV2TaskProxyLess", // Task type for reCAPTCHA v3 without proxy
websiteURL = "", // The website URL to solve CAPTCHA for
websiteKey = "" // reCAPTCHA site key
}
};
// Send the request to create the task
var content = new StringContent(Newtonsoft.Json.JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync($"{apiUrl}/createTask", content);
string responseContent = await response.Content.ReadAsStringAsync();
if (!response.IsSuccessStatusCode)
{
throw new Exception("Failed to create task: " + responseContent);
}
JObject jsonResponse = JObject.Parse(responseContent);
if (jsonResponse["errorId"].ToString() != "0")
{
throw new Exception("Error creating task: " + jsonResponse["errorDescription"]);
}
// Return the task ID to be used in the next step
return jsonResponse["taskId"].ToString();
}
}
// Method to retrieve the result of a CAPTCHA-solving task
private static async Task<string> GetTaskResult(string taskId)
{
using (HttpClient client = new HttpClient())
{
// Request payload
var requestBody = new
{
clientKey = clientKey,
taskId = taskId
};
var content = new StringContent(Newtonsoft.Json.JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json");
// Poll for the result of the task every 5 seconds
while (true)
{
HttpResponseMessage response = await client.PostAsync($"{apiUrl}/getTaskResult", content);
string responseContent = await response.Content.ReadAsStringAsync();
if (!response.IsSuccessStatusCode)
{
throw new Exception("Failed to get task result: " + responseContent);
}
JObject jsonResponse = JObject.Parse(responseContent);
if (jsonResponse["errorId"].ToString() != "0")
{
throw new Exception("Error getting task result: " + jsonResponse["errorDescription"]);
}
// If the task is ready, return the CAPTCHA token
if (jsonResponse["status"].ToString() == "ready")
{
return jsonResponse["solution"]["gRecaptchaResponse"].ToString();
}
// Wait for 5 seconds before checking again
Console.WriteLine("Task is still processing, waiting 5 seconds...");
await Task.Delay(5000);
}
}
}
}
Explanation:
-
CreateTask Method:
- This method sends a POST request to Capsolver's
/createTask
endpoint to create a new task for solving a reCAPTCHA v2 challenge. - The request includes the
clientKey
,websiteURL
,websiteKey
, and specifies the task type asReCaptchaV2TaskProxyLess
. - The method returns a
taskId
, which will be used to retrieve the task result.
- This method sends a POST request to Capsolver's
-
GetTaskResult Method:
- This method sends a POST request to the
/getTaskResult
endpoint to check the result of the previously created task. - It keeps polling the task status every 5 seconds until the task is completed (
status: ready
). - Once the task is ready, it returns the
gRecaptchaResponse
, which can be used to bypass the CAPTCHA.
- This method sends a POST request to the
Example: Solving reCAPTCHA v3 with Capsolver
In this section, we will demonstrate how to solve reCAPTCHA v3 challenges using the Capsolver API and HttpClient.
csharp
using System;
using System.Net.Http;
using System.Text;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
class Program
{
private static readonly string apiUrl = "https://api.capsolver.com";
private static readonly string clientKey = "YOUR_API_KEY"; // Replace with your Capsolver API Key
static async Task Main(string[] args)
{
try
{
// Step 1: Create a task for solving reCAPTCHA v3
string taskId = await CreateTask();
Console.WriteLine("Task ID: " + taskId);
// Step 2: Retrieve the result of the task
string taskResult = await GetTaskResult(taskId);
Console.WriteLine("Task Result (CAPTCHA Token): " + taskResult);
}
catch (Exception ex)
{
Console.WriteLine("Error: " + ex.Message);
}
}
// Method to create a new CAPTCHA-solving task
private static async Task<string> CreateTask()
{
using (HttpClient client = new HttpClient())
{
// Request payload
var requestBody = new
{
clientKey = clientKey,
task = new
{
type = "ReCaptchaV3TaskProxyLess", // Task type for reCAPTCHA v3 without proxy
websiteURL = "", // The website URL to solve CAPTCHA for
websiteKey = "" // reCAPTCHA site key
}
};
// Send the request to create the task
var content = new StringContent(Newtonsoft.Json.JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync($"{apiUrl}/createTask", content);
string responseContent = await response.Content.ReadAsStringAsync();
if (!response.IsSuccessStatusCode)
{
throw new Exception("Failed to create task: " + responseContent);
}
JObject jsonResponse = JObject.Parse(responseContent);
if (jsonResponse["errorId"].ToString() != "0")
{
throw new Exception("Error creating task: " + jsonResponse["errorDescription"]);
}
// Return the task ID to be used in the next step
return jsonResponse["taskId"].ToString();
}
}
// Method to retrieve the result of a CAPTCHA-solving task
private static async Task<string> GetTaskResult(string taskId)
{
using (HttpClient client = new HttpClient())
{
// Request payload
var requestBody = new
{
clientKey = clientKey,
taskId = taskId
};
var content = new StringContent(Newtonsoft.Json.JsonConvert.SerializeObject(requestBody), Encoding.UTF8, "application/json");
// Poll for the result of the task every 5 seconds
while (true)
{
HttpResponseMessage response = await client.PostAsync($"{apiUrl}/getTaskResult", content);
string responseContent = await response.Content.ReadAsStringAsync();
if (!response.IsSuccessStatusCode)
{
throw new Exception("Failed to get task result: " + responseContent);
}
JObject jsonResponse = JObject.Parse(responseContent);
if (jsonResponse["errorId"].ToString() != "0")
{
throw new Exception("Error getting task result: " + jsonResponse["errorDescription"]);
}
// If the task is ready, return the CAPTCHA token
if (jsonResponse["status"].ToString() == "ready")
{
return jsonResponse["solution"]["gRecaptchaResponse"].ToString();
}
// Wait for 5 seconds before checking again
Console.WriteLine("Task is still processing, waiting 5 seconds...");
await Task.Delay(5000);
}
}
}
}
Explanation:
-
CreateTask Method:
- This method sends a POST request to Capsolver's
/createTask
endpoint to create a new task for solving a reCAPTCHA v3 challenge. - The request includes the
clientKey
,websiteURL
,websiteKey
, and specifies the task type asReCaptchaV3TaskProxyLess
. - The method returns a
taskId
, which will be used to retrieve the task result.
- This method sends a POST request to Capsolver's
-
GetTaskResult Method:
- This method sends a POST request to the
/getTaskResult
endpoint to check the result of the previously created task. - It keeps polling the task status every 5 seconds until the task is completed (
status: ready
). - Once the task is ready, it returns the
gRecaptchaResponse
, which can be used to bypass the CAPTCHA.
- This method sends a POST request to the
Bonus Code
Claim Your Bonus Code for top captcha solutions; CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited
Web Scraping Best Practices in C#
When using web scraping tools in C#, always follow these best practices:
- Respect
robots.txt
: Ensure that the website allows web scraping by checking therobots.txt
file. - Rate Limiting: Avoid making too many requests in a short period to prevent getting blocked by the website.
- Proxy Rotation: Use proxies to distribute requests across multiple IPs to avoid being flagged as a bot.
- Spoof Headers: Simulate browser-like requests by adding custom headers, such as
User-Agent
, to your HTTP requests.
Conclusion
By using HttpClient for web scraping and Capsolver for CAPTCHA solving, you can effectively automate interactions with websites that employ CAPTCHA challenges. Always ensure that your web scraping activities comply with the target website's terms of service and legal requirements.
Happy scraping!
This guide integrates web scraping using HtmlAgilityPack and demonstrates how to handle reCAPTCHA challenges with Capsolver, using only HttpClient in C#.
Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.
More

AI-powered Image Recognition: The Basics and How to Solve it
Say goodbye to image CAPTCHA struggles – CapSolver Vision Engine solves them fast, smart, and hassle-free!

Lucas Mitchell
24-Apr-2025

Best User Agents for Web Scraping & How to Use Them
A guide to the best user agents for web scraping and their effective use to avoid detection. Explore the importance of user agents, types, and how to implement them for seamless and undetectable web scraping.

Ethan Collins
07-Mar-2025

What is a Captcha? Can Captcha Track You?
Ever wondered what a CAPTCHA is and why websites make you solve them? Learn how CAPTCHAs work, whether they track you, and why they’re crucial for web security. Plus, discover how to bypass CAPTCHAs effortlessly with CapSolver for web scraping and automation.

Lucas Mitchell
05-Mar-2025

Cloudflare TLS Fingerprinting: What It Is and How to Solve It
Learn about Cloudflare's use of TLS fingerprinting for security, how it detects and blocks bots, and explore effective methods to solve it for web scraping and automated browsing tasks.

Lucas Mitchell
28-Feb-2025

Why do I keep getting asked to verify I'm not a robot?
Learn why Google prompts you to verify you're not a robot and explore solutions like using CapSolver’s API to solve CAPTCHA challenges efficiently.

Ethan Collins
27-Feb-2025

What is the best CAPTCHA solver in 2025
Discover the best CAPTCHA solver in 2025 with CapSolver, the ultimate tool for automated web scraping, CAPTCHA bypass, and data collection using advanced AI and machine learning. Enjoy bonus codes, seamless integration, and real-world examples to boost your scraping efficiency.

AloĂsio VĂtor
25-Feb-2025