How to Use geziyor for Web Scraping

Blog

All

Blog

All

How to Use geziyor for Web Scraping

Aloísio Vítor

Image Processing Expert

27-Sep-2024

Geziyor: A Powerful Web Scraping Framework for Go

Geziyor is a modern web scraping framework for Go, designed to offer powerful tools for scraping websites and extracting data efficiently. Unlike many traditional scraping libraries, Geziyor emphasizes ease of use while providing highly customizable scraping workflows.

Key Features:

Concurrency Support: It supports asynchronous operations, allowing you to scrape multiple pages concurrently, which boosts performance.
Request Customization: Easily modify HTTP requests, including headers, cookies, and custom parameters.
Automatic Throttling: Helps avoid triggering anti-scraping mechanisms by pacing the requests to servers.
Built-in Caching and Persistence: It supports caching scraped data and responses to avoid redundant requests.
Extensibility: Offers hooks to extend functionality or handle events like request/response interception, custom middlewares, and more.
Supports Proxies: Easily integrate proxies for rotating IPs or bypassing restrictions.

Prerequisites

To use Geziyor, ensure you have:

Go 1.12+ installed from the official Go website.
Basic knowledge of Go language.

Installation

To install Geziyor, you can run:

bash Copy

go get -u github.com/geziyor/geziyor

Basic Example: Web Scraping with Geziyor

Here is a simple example to scrape a website and print the titles of the articles:

go Copy

package main

import (
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
    "github.com/PuerkitoBio/goquery"
    "log"
)

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartURLs: []string{"https://news.ycombinator.com"},
        ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
            r.HTMLDoc.Find(".storylink").Each(func(i int, s *goquery.Selection) {
                log.Println(s.Text())
            })
        },
    }).Start()
}

Advanced Example: Scraping with Custom Headers and POST Requests

Sometimes, you need to simulate a more complex interaction with the server, like logging in or interacting with dynamic websites. In this example, we will show how to send a custom header and a POST request.

go Copy

package main

import (
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
    "log"
)

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartRequestsFunc: func(g *geziyor.Geziyor) {
            g.Do(&client.Request{
                Method: "POST",
                URL:    "https://httpbin.org/post",
                Body:   []byte(`{"username": "test", "password": "123"}`),
                Headers: map[string]string{
                    "Content-Type": "application/json",
                },
            })
        },
        ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
            log.Println(string(r.Body))
        },
    }).Start()
}

Handling Cookies and Sessions in Geziyor

You might need to manage cookies or maintain sessions during scraping. Geziyor simplifies cookie management by automatically handling cookies for each request, and you can also customize the cookie handling process if needed.

go Copy

package main

import (
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
    "log"
)

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartRequestsFunc: func(g *geziyor.Geziyor) {
            g.Do(&client.Request{
                URL: "https://httpbin.org/cookies/set?name=value",
            })
        },
        ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
            log.Println("Cookies:", r.Cookies())
        },
    }).Start()
}

Using Proxies with Geziyor

To scrape a website while avoiding IP restrictions or blocks, you can route your requests through a proxy. Here's how to configure proxy support with Geziyor:

go Copy

package main

import (
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
    "log"
)

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartRequestsFunc: func(g *geziyor.Geziyor) {
            g.Do(&client.Request{
                URL:    "https://httpbin.org/ip",
                Proxy:  "http://username:password@proxyserver:8080",
            })
        },
        ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
            log.Println(string(r.Body))
        },
    }).Start()
}

Handling Captchas with Geziyor

While Geziyor doesn’t natively solve captchas, you can integrate it with a captcha-solving service such as CapSolver. Here's how you can use CapSolver to solve captchas in conjunction with Geziyor.

Example: Solving ReCaptcha V2 Using Geziyor and CapSolver

First, you need to integrate CapSolver and handle requests for captcha challenges.

go Copy

package main

import (
    "encoding/json"
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
    "log"
    "time"
)

const CAPSOLVER_KEY = "YourKey"

func createTask(url, key string) (string, error) {
    payload := map[string]interface{}{
        "clientKey": CAPSOLVER_KEY,
        "task": map[string]interface{}{
            "type":        "ReCaptchaV2TaskProxyLess",
            "websiteURL":  url,
            "websiteKey":  key,
        },
    }

    response, err := client.NewRequest().
        Method("POST").
        URL("https://api.capsolver.com/createTask").
        JSON(payload).
        Do()

    if err != nil {
        return "", err
    }

    var result map[string]interface{}
    json.Unmarshal(response.Body, &result)
    return result["taskId"].(string), nil
}

func getTaskResult(taskId string) (string, error) {
    payload := map[string]interface{}{
        "clientKey": CAPSOLVER_KEY,
        "taskId":    taskId,
    }

    for {
        response, err := client.NewRequest().
            Method("POST").
            URL("https://api.capsolver.com/getTaskResult").
            JSON(payload).
            Do()

        if err != nil {
            return "", err
        }

        var result map[string]interface{}
        json.Unmarshal(response.Body, &result)

        if result["status"] == "ready" {
            return result["solution"].(string), nil
        }

        time.Sleep(5 * time.Second)
    }
}

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartRequestsFunc: func(g *geziyor.Geziyor) {
            taskId, _ := createTask("https://example.com", "6LcR_okUAAAAAPYrPe-HK_0RULO1aZM15ENyM-Mf")
            solution, _ := getTaskResult(taskId)

            g.Do(&client.Request{
                Method: "POST",
                URL:    "https://example.com/submit",
                Body:   []byte(`g-recaptcha-response=` + solution),
            })
        },
        ParseFunc: func(g *geziyor.Geziyor, r *client.Response) {
            log.Println("Captcha Solved:", string(r.Body))
        },
    }).Start()
}

Performance Optimizations with Geziyor

Geziyor excels at handling high-volume scraping tasks, but performance can be further optimized by adjusting certain options:

Concurrency: Increase ConcurrentRequests to allow multiple parallel requests.
Request Delay: Implement a delay between requests to avoid detection.

Example with concurrency and delay:

go Copy

package main

import (
    "github.com/geziyor/geziyor"
    "github.com/geziyor/geziyor/client"
)

func main() {
    geziyor.NewGeziyor(&geziyor.Options{
        StartURLs:          []string{"https://example.com"},
        ParseFunc:          func(g *geziyor.Geziyor, r *client.Response) {},
        ConcurrentRequests: 10,
        RequestDelay:       2,
    }).Start()
}

Bonus Code

Claim your Bonus Code for top captcha solutions at CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, unlimited times.

Conclusion

Geziyor is a powerful, fast, and flexible web scraping framework for Go, making it a great choice for developers looking to build scalable scraping systems. Its built-in support for concurrency, customizable requests, and the ability to integrate with external services like CapSolver make it an ideal tool for both simple and advanced scraping tasks.

Whether you're collecting data from blogs, e-commerce sites, or building custom scraping pipelines, Geziyor has the features you need to get started quickly and efficiently.

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

AI-powered Image Recognition: The Basics and How to Solve it

Say goodbye to image CAPTCHA struggles – CapSolver Vision Engine solves them fast, smart, and hassle-free!

Lucas Mitchell

24-Apr-2025

Best User Agents for Web Scraping & How to Use Them

A guide to the best user agents for web scraping and their effective use to avoid detection. Explore the importance of user agents, types, and how to implement them for seamless and undetectable web scraping.

Ethan Collins

07-Mar-2025

What is a Captcha? Can Captcha Track You?

Ever wondered what a CAPTCHA is and why websites make you solve them? Learn how CAPTCHAs work, whether they track you, and why they’re crucial for web security. Plus, discover how to bypass CAPTCHAs effortlessly with CapSolver for web scraping and automation.

Lucas Mitchell

05-Mar-2025

Cloudflare TLS Fingerprinting: What It Is and How to Solve It

Learn about Cloudflare's use of TLS fingerprinting for security, how it detects and blocks bots, and explore effective methods to solve it for web scraping and automated browsing tasks.

Cloudflare

Lucas Mitchell

28-Feb-2025

Why do I keep getting asked to verify I'm not a robot?

Learn why Google prompts you to verify you're not a robot and explore solutions like using CapSolver’s API to solve CAPTCHA challenges efficiently.

Ethan Collins

27-Feb-2025

What is the best CAPTCHA solver in 2025

Discover the best CAPTCHA solver in 2025 with CapSolver, the ultimate tool for automated web scraping, CAPTCHA bypass, and data collection using advanced AI and machine learning. Enjoy bonus codes, seamless integration, and real-world examples to boost your scraping efficiency.

Aloísio Vítor

25-Feb-2025

How to Use geziyor for Web Scraping

Geziyor: A Powerful Web Scraping Framework for Go

Key Features:

Prerequisites

Installation

Basic Example: Web Scraping with Geziyor

Advanced Example: Scraping with Custom Headers and POST Requests

Handling Cookies and Sessions in Geziyor

Using Proxies with Geziyor

Handling Captchas with Geziyor

Example: Solving ReCaptcha V2 Using Geziyor and CapSolver

Performance Optimizations with Geziyor

Bonus Code

Conclusion

More

AI-powered Image Recognition: The Basics and How to Solve it

Best User Agents for Web Scraping & How to Use Them

What is a Captcha? Can Captcha Track You?

Cloudflare TLS Fingerprinting: What It Is and How to Solve It

Why do I keep getting asked to verify I'm not a robot?

What is the best CAPTCHA solver in 2025