Changing User Agents for Web Scraping with Go Colly

Lucas Mitchell

Automation Engineer

27-Sep-2024

When you're working on web scraping projects, changing the User-Agent string is one of the most effective ways to prevent your scraper from getting blocked or flagged as a bot. Web servers often use the User-Agent string to identify the type of client (e.g., browser, bot, or scraper) accessing their resources. If your scraper sends the same User-Agent on each request, you run the risk of being detected and potentially blocked. In this article, we'll explore how to change the User-Agent in Go Colly, a popular web scraping framework in Go, to make your scraping efforts more effective and resilient.

What Is Colly？

Colly is a fast and elegant Gophers crawling framework.Colly provides a clean interface to write any kind of crawler/scraper/spider.With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
Certainly! I'll translate the text about User Agents into English and format it appropriately. Here's the translated and formatted version:

What Is Colly User Agent?

User Agent is a special string found in request headers that allows servers to identify the client's operating system and version, browser type and version, and other details.

For normal browsers, the User Agent strings look like this:

Google Chrome version 128 on Windows operating system:

Copy

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36

Firefox:

Copy

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0

However, in Colly (a web scraping framework), the default User Agent is:

Copy

colly - https://github.com/gocolly/colly

In the context of data scraping, one of the most common anti-scraping measures is to determine whether the request is coming from a normal browser by examining the User Agent. This helps in identifying bots.

Colly's default User Agent is obviously equivalent to directly telling the target website: "I am a bot." This makes it easy for websites to detect and potentially block scraping attempts using Colly with its default settings.

Bonus Code

Claim Your Bonus Code for top captcha solutions; CapSolver: WEBS. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

Why Change the User-Agent?

Before diving into the code, let's take a quick look at why changing the User-Agent is crucial:

Avoid Detection: Many websites use anti-bot mechanisms that analyze incoming User-Agent strings to detect suspicious or repetitive patterns. If your scraper sends the same User-Agent in every request, it becomes an easy target for detection.
Mimic Real Browsers: By changing the User-Agent string, your scraper can mimic real browsers such as Chrome, Firefox, or Safari, making it less likely to be flagged as a bot.
Enhance User Experience and Solve CAPTCHA: Many websites use CAPTCHA challenges to verify that a user is not a bot, ensuring a more secure browsing experience. However, for automation tasks, these challenges can interrupt the workflow. If your scraper encounters such CAPTCHA challenges, you can integrate tools like CapSolver to automatically solve them, allowing your automation to continue smoothly without interruptions.

How to Set Up Custom User Agent in Colly

Certainly! I'll translate the text into English and format it appropriately. Here's the translated and formatted version:

Handling User Agents in Colly

We can check the value of our User Agent by visiting https://httpbin.org/user-agent. Colly primarily provides three methods to handle requests:

Visit: Access the target website
OnResponse: Process the response content
OnError: Handle request errors

Here's a complete code example to access httpbin and print the User Agent:

go Copy

package main

import (
    "github.com/gocolly/colly"
    "log"
)

func main() {
    // Create a new collector
    c := colly.NewCollector()

    // Call the onResponse callback and print the HTML content
    c.OnResponse(func(r *colly.Response) {
       log.Println(string(r.Body))
    })

    // Handle request errors
    c.OnError(func(e *colly.Response, err error) {
       log.Println("Request failed, err:", err)
    })

    // Start scraping
    err := c.Visit("https://httpbin.org/user-agent")
    if err != nil {
       log.Fatal(err)
    }
}

This will output:

json Copy

{
  "user-agent": "colly - https://github.com/gocolly/colly"
}

Customizing User Agents

Colly provides the colly.UserAgent method to customize the User Agent. If you want to use different User Agents for each request, you can define a list of User Agents and randomly select from it. Here's an example:

go Copy

package main

import (
    "github.com/gocolly/colly"
    "log"
    "math/rand"
)

var userAgents = []string{
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/128.0.0.0 Safari/537.36 Edg/128.0.0.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36 OPR/113.0.0.0",
}

func main() {
    // Create a new collector
    c := colly.NewCollector(
       // Set the user agent
       colly.UserAgent(userAgents[rand.Intn(len(userAgents))]),
    )

    // Call the onResponse callback and print the HTML content
    c.OnResponse(func(r *colly.Response) {
       log.Println(string(r.Body))
    })

    // Handle request errors
    c.OnError(func(e *colly.Response, err error) {
       log.Println("Request failed, err:", err)
    })

    // Start scraping
    err := c.Visit("https://httpbin.org/user-agent")
    if err != nil {
       log.Fatal(err)
    }
}

Using fake-useragent Library

Instead of maintaining a custom User Agent list, we can use the fake-useragent library to generate random User Agents. Here's an example:

go Copy

package main

import (
    browser "github.com/EDDYCJY/fake-useragent"
    "github.com/gocolly/colly"
    "log"
)

func main() {
    // Create a new collector
    c := colly.NewCollector(
       // Set the user agent
       colly.UserAgent(browser.Random()),
    )

    // Call the onResponse callback and print the HTML content
    c.OnResponse(func(r *colly.Response) {
       log.Println(string(r.Body))
    })

    // Handle request errors
    c.OnError(func(e *colly.Response, err error) {
       log.Println("Request failed, err:", err)
    })

    // Start scraping
    err := c.Visit("https://httpbin.org/user-agent")
    if err != nil {
       log.Fatal(err)
    }
}

Integrating CapSolver

While randomizing User Agents in Colly can help avoid being identified as a bot to some extent, it may not be sufficient when facing more sophisticated anti-bot challenges. Examples of such challenges include reCAPTCHA, Cloudflare Turnstile, and others. These systems check the validity of your request headers, verify your browser fingerprint, assess the risk of your IP, and may require complex JS encryption parameters or difficult image recognition tasks.

These challenges can significantly hinder your data scraping efforts. However, there's no need to worry - all of the aforementioned bot challenges can be handled by CapSolver. CapSolver uses AI-based Auto Web Unblock technology to automatically solve CAPTCHAs. All complex tasks can be successfully resolved within seconds.

The official website provides SDKs in multiple languages, making it easy to integrate into your project. You can refer to the CapSolver documentation for more information on how to implement this solution in your scraping projects.

Certainly! Here's a conclusion for the article on changing User Agents in Go Colly:

Conclusion

Changing the User-Agent in Go Colly is a crucial technique for effective and resilient web scraping. By implementing custom User-Agents, you can significantly reduce the risk of your scraper being detected and blocked by target websites. Here's a summary of the key points we've covered:

We've learned why changing the User-Agent is important for web scraping projects.
We've explored different methods to set custom User-Agents in Colly, including:
- Using a predefined list of User-Agents
- Implementing random selection from this list
- Utilizing the fake-useragent library for more diverse options
We've discussed how these techniques can help mimic real browser behavior and avoid detection.
For more advanced anti-bot challenges, we've introduced the concept of using specialized tools like CapSolver to handle CAPTCHAs and other complex verification systems.

Remember, while changing User-Agents is an effective strategy, it's just one part of responsible and efficient web scraping. Always respect websites' terms of service and robots.txt files, implement rate limiting, and consider the ethical implications of your scraping activities.

By combining these techniques with other best practices in web scraping, you can create more robust and reliable scrapers using Go Colly. As web technologies continue to evolve, staying updated with the latest scraping techniques and tools will be crucial for maintaining the effectiveness of your web scraping projects.

Note on Compliance

Important: When engaging in web scraping, it's crucial to adhere to legal and ethical guidelines. Always ensure that you have permission to scrape the target website, and respect the site's robots.txt file and terms of service. CapSolver firmly opposes the misuse of our services for any non-compliant activities. Misuse of automated tools to bypass CAPTCHAs without proper authorization can lead to legal consequences. Make sure your scraping activities are compliant with all applicable lcaptcha and regulations to avoid potential issues.

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

AI-powered Image Recognition: The Basics and How to Solve it

Say goodbye to image CAPTCHA struggles – CapSolver Vision Engine solves them fast, smart, and hassle-free!

Lucas Mitchell

24-Apr-2025

Best User Agents for Web Scraping & How to Use Them

A guide to the best user agents for web scraping and their effective use to avoid detection. Explore the importance of user agents, types, and how to implement them for seamless and undetectable web scraping.

Ethan Collins

07-Mar-2025

What is a Captcha? Can Captcha Track You?

Ever wondered what a CAPTCHA is and why websites make you solve them? Learn how CAPTCHAs work, whether they track you, and why they’re crucial for web security. Plus, discover how to bypass CAPTCHAs effortlessly with CapSolver for web scraping and automation.

Lucas Mitchell

05-Mar-2025

Cloudflare TLS Fingerprinting: What It Is and How to Solve It

Learn about Cloudflare's use of TLS fingerprinting for security, how it detects and blocks bots, and explore effective methods to solve it for web scraping and automated browsing tasks.

Cloudflare

Lucas Mitchell

28-Feb-2025

Why do I keep getting asked to verify I'm not a robot?

Learn why Google prompts you to verify you're not a robot and explore solutions like using CapSolver’s API to solve CAPTCHA challenges efficiently.

Ethan Collins

27-Feb-2025

What is the best CAPTCHA solver in 2025

Discover the best CAPTCHA solver in 2025 with CapSolver, the ultimate tool for automated web scraping, CAPTCHA bypass, and data collection using advanced AI and machine learning. Enjoy bonus codes, seamless integration, and real-world examples to boost your scraping efficiency.

Aloísio Vítor

25-Feb-2025

Changing User Agents for Web Scraping with Go Colly

What Is Colly？

What Is Colly User Agent?

Bonus Code

Why Change the User-Agent?

How to Set Up Custom User Agent in Colly

Handling User Agents in Colly

Customizing User Agents

Using fake-useragent Library

Integrating CapSolver

Conclusion

Note on Compliance

More

AI-powered Image Recognition: The Basics and How to Solve it

Best User Agents for Web Scraping & How to Use Them

What is a Captcha? Can Captcha Track You?

Cloudflare TLS Fingerprinting: What It Is and How to Solve It

Why do I keep getting asked to verify I'm not a robot?

What is the best CAPTCHA solver in 2025