将Katana与CapSolver集成:用于网络爬虫的自动CAPTCHA解决

Emma Foster
Machine Learning Engineer
09-Jan-2026
请将以下英文文本翻译成中文,并仅返回翻译结果,注意:不要转义代码中的特殊符号:

网络爬虫是安全研究人员、渗透测试人员和数据分析师的重要技术。然而,现代网站越来越多地使用 CAPTCHA 来防止自动化访问。本指南展示了如何将 Katana(ProjectDiscovery 的强大网络爬虫)与 CapSolver(领先的 CAPTCHA 解决服务)集成,以创建能够自动处理 CAPTCHA 挑战的爬虫解决方案。
你将学到的内容
- 在无头浏览器模式下设置 Katana
- 集成 Capsolver 的 API 实现自动 CAPTCHA 解决
- 处理 reCAPTCHA v2 和 Cloudflare Turnstile
- 每种 CAPTCHA 类型的完整可运行代码示例
- 高效且负责任的爬虫最佳实践
什么是 Katana?
Katana 是由 ProjectDiscovery 开发的下一代网络爬虫框架。它设计用于速度和灵活性,非常适合安全侦察和自动化流水线。
主要功能
- 双爬取模式:标准 HTTP 爬取和无头浏览器自动化
- JavaScript 支持:解析和爬取 JavaScript 渲染的内容
- 灵活配置:自定义请求头、Cookie、表单填写和作用域控制
- 多种输出格式:纯文本、JSON 或 JSONL
安装
bash
# 需要 Go 1.24+
CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest
基本用法
bash
katana -u https://example.com -headless
什么是 CapSolver?
CapSolver 是一个基于人工智能的 CAPTCHA 解决服务,为各种 CAPTCHA 类型提供快速可靠的解决方案。
支持的 CAPTCHA 类型
- reCAPTCHA:v2 和企业版
- Cloudflare:Turnstile 和 Challenge
- AWS WAF:绕过 WAF 保护
- 以及 更多
API 工作流程
CapSolver 使用基于任务的 API 模型:
- 创建任务:提交 CAPTCHA 参数(类型、siteKey、URL)
- 获取任务 ID:接收唯一任务标识符
- 轮询结果:检查任务状态直到解决方案就绪
- 接收令牌:获取已解决的 CAPTCHA 令牌
前提条件
开始前,请确保您已安装:
- Go 1.24+
- CapSolver API 密钥 - 立即注册
- Chrome 浏览器(用于无头模式)
设置 API 密钥为环境变量:
bash
export CAPSOLVER_API_KEY="YOUR_API_KEY"
集成架构
┌─────────────────────────┐
│ Go 应用程序 │
│ (go-rod 浏览器) │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 目标网站 │
│ (带有 CAPTCHA) │
└───────────┬─────────────┘
│
检测到 CAPTCHA
│
▼
┌─────────────────────────┐
│ 提取参数 │
│ (siteKey, URL, 类型) │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ Capsolver API │
│ createTask() │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 轮询结果 │
│ getTaskResult() │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 注入令牌 │
│ 到页面中 │
└───────────┬─────────────┘
│
▼
┌─────────────────────────┐
│ 继续爬取 │
└─────────────────────────┘
使用 CapSolver 解决 reCAPTCHA v2
reCAPTCHA v2 是最常见的 CAPTCHA 类型,显示“我不是机器人”复选框或图像挑战。以下是一个完整的可运行脚本,用于解决 reCAPTCHA v2:
go
// reCAPTCHA v2 解决器 - 完整示例
// 使用方法:go run main.go
// 需要:CAPSOLVER_API_KEY 环境变量
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"os"
"strings"
"time"
"github.com/go-rod/rod"
"github.com/go-rod/rod/lib/launcher"
)
// 配置
var (
CAPSOLVER_API_KEY = os.Getenv("CAPSOLVER_API_KEY")
CAPSOLVER_API = "https://api.capsolver.com"
)
// API 响应结构
type CreateTaskResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
TaskID string `json:"taskId"`
}
type GetTaskResultResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
Status string `json:"status"`
Solution struct {
GRecaptchaResponse string `json:"gRecaptchaResponse"`
} `json:"solution"`
}
type BalanceResponse struct {
ErrorID int `json:"errorId"`
Balance float64 `json:"balance"`
}
// CapsolverClient 处理 API 通信
type CapsolverClient struct {
APIKey string
Client *http.Client
}
// NewCapsolverClient 创建新的 Capsolver 客户端
func NewCapsolverClient(apiKey string) *CapsolverClient {
return &CapsolverClient{
APIKey: apiKey,
Client: &http.Client{Timeout: 120 * time.Second},
}
}
// GetBalance 获取账户余额
func (c *CapsolverClient) GetBalance() (float64, error) {
payload := map[string]string{"clientKey": c.APIKey}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getBalance", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return 0, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result BalanceResponse
json.Unmarshal(body, &result)
if result.ErrorID != 0 {
return 0, fmt.Errorf("余额检查失败")
}
return result.Balance, nil
}
// SolveRecaptchaV2 解决 reCAPTCHA v2 挑战
func (c *CapsolverClient) SolveRecaptchaV2(websiteURL, siteKey string) (string, error) {
log.Printf("为 %s 创建 reCAPTCHA v2 任务", websiteURL)
// 创建任务
task := map[string]interface{}{
"type": "ReCaptchaV2TaskProxyLess",
"websiteURL": websiteURL,
"websiteKey": siteKey,
}
payload := map[string]interface{}{
"clientKey": c.APIKey,
"task": task,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/createTask", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return "", fmt.Errorf("创建任务失败: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var createResult CreateTaskResponse
json.Unmarshal(body, &createResult)
if createResult.ErrorID != 0 {
return "", fmt.Errorf("API 错误: %s - %s", createResult.ErrorCode, createResult.ErrorDescription)
}
log.Printf("任务已创建: %s", createResult.TaskID)
// 轮询结果
for i := 0; i < 120; i++ {
result, err := c.getTaskResult(createResult.TaskID)
if err != nil {
return "", err
}
if result.Status == "ready" {
log.Printf("CAPTCHA 成功解决!")
return result.Solution.GRecaptchaResponse, nil
}
if result.Status == "failed" {
return "", fmt.Errorf("任务失败: %s", result.ErrorDescription)
}
if i%10 == 0 {
log.Printf("等待解决方案... (%ds)", i)
}
time.Sleep(1 * time.Second)
}
return "", fmt.Errorf("等待解决方案超时")
}
func (c *CapsolverClient) getTaskResult(taskID string) (*GetTaskResultResponse, error) {
payload := map[string]string{
"clientKey": c.APIKey,
"taskId": taskID,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getTaskResult", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return nil, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result GetTaskResultResponse
json.Unmarshal(body, &result)
return &result, nil
}
// extractSiteKey 从页面 HTML 中提取 reCAPTCHA site key
func extractSiteKey(html string) string {
// 查找 data-sitekey 属性
patterns := []string{
`data-sitekey="`,
`data-sitekey='`,
`"sitekey":"`,
`'sitekey':'`,
}
for _, pattern := range patterns {
if idx := strings.Index(html, pattern); idx != -1 {
start := idx + len(pattern)
end := start
for end < len(html) && html[end] != '"' && html[end] != '\'' {
end++
}
if end > start {
return html[start:end]
}
}
}
return ""
}
// injectRecaptchaToken 将解决的令牌注入页面
func injectRecaptchaToken(page *rod.Page, token string) error {
js := fmt.Sprintf(`
(function() {
// 设置响应文本区域
var responseField = document.getElementById('g-recaptcha-response');
if (responseField) {
responseField.style.display = 'block';
responseField.value = '%s';
}
// 同样设置任何隐藏的文本区域
var textareas = document.querySelectorAll('textarea[name="g-recaptcha-response"]');
for (var i = 0; i < textareas.length; i++) {
textareas[i].value = '%s';
}
// 如果存在回调函数则触发
if (typeof ___grecaptcha_cfg !== 'undefined') {
var clients = ___grecaptcha_cfg.clients;
for (var key in clients) {
var client = clients[key];
if (client) {
// 尝试找到并调用回调
try {
var callback = client.callback ||
(client.Q && client.Q.callback) ||
(client.S && client.S.callback);
if (typeof callback === 'function') {
callback('%s');
}
} catch(e) {}
}
}
}
return true;
})();
`, token, token, token)
_, err := page.Eval(js)
return err
}
func main() {
// 检查 API 密钥
if CAPSOLVER_API_KEY == "" {
log.Fatal("CAPSOLVER_API_KEY 环境变量是必需的")
}
// 目标 URL - Google 的 reCAPTCHA 演示页面
targetURL := "https://www.google.com/recaptcha/api2/demo"
log.Println("==============================================")
log.Println("Katana + Capsolver - reCAPTCHA v2 演示")
log.Println("==============================================")
// 初始化 Capsolver 客户端
client := NewCapsolverClient(CAPSOLVER_API_KEY)
// 检查余额
balance, err := client.GetBalance()
if err != nil {
log.Printf("警告:无法检查余额: %v", err)
} else {
log.Printf("Capsolver 余额: $%.2f", balance)
}
// 启动浏览器
log.Println("启动浏览器...")
path, _ := launcher.LookPath()
u := launcher.New().Bin(path).Headless(true).MustLaunch()
browser := rod.New().ControlURL(u).MustConnect()
defer browser.MustClose()
// 导航到目标
log.Printf("导航到: %s", targetURL)
page := browser.MustPage(targetURL)
page.MustWaitLoad()
time.Sleep(2 * time.Second)
// 获取页面 HTML 并提取 site key
html := page.MustHTML()
// 检查 reCAPTCHA
if !strings.Contains(html, "g-recaptcha") && !strings.Contains(html, "grecaptcha") {
log.Fatal("页面上未找到 reCAPTCHA")
}
log.Println("检测到 reCAPTCHA!")
// 提取 site key
siteKey := extractSiteKey(html)
if siteKey == "" {
log.Fatal("无法提取 site key")
}
log.Printf("Site key: %s", siteKey)
// 解决 CAPTCHA
log.Println("使用 Capsolver 解决 CAPTCHA...")
token, err := client.SolveRecaptchaV2(targetURL, siteKey)
if err != nil {
log.Fatalf("解决 CAPTCHA 失败: %v", err)
}
log.Printf("收到令牌: %s...", token[:50])
// 注入令牌
log.Println("将令牌注入页面...")
err = injectRecaptchaToken(page, token)
if err != nil {
log.Fatalf("注入令牌失败: %v", err)
}
// 提交表单
log.Println("提交表单...")
submitBtn := page.MustElement("#recaptcha-demo-submit")
submitBtn.MustClick()
// 等待结果
time.Sleep(3 * time.Second)
// 检查结果
newHTML := page.MustHTML()
if strings.Contains(newHTML, "Verification Success") || strings.Contains(newHTML, "success") {
log.Println("==============================================")
log.Println("成功!reCAPTCHA 已解决并验证!")
log.Println("==============================================")
} else {
log.Println("表单已提交 - 检查页面以查看结果")
}
// 获取页面标题
title := page.MustEval(`document.title`).String()
log.Printf("最终页面标题: %s", title)
}
设置和运行
bash
# 创建项目
mkdir katana-recaptcha-v2
cd katana-recaptcha-v2
go mod init katana-recaptcha-v2
# 安装依赖
go get github.com/go-rod/rod@latest
# 设置 API 密钥
export CAPSOLVER_API_KEY="YOUR_API_KEY"
# 运行
go run main.go
使用 CapSolver 解决 Cloudflare Turnstile
Cloudflare Turnstile 是一种注重隐私的 CAPTCHA 曯代方案。以下是一个完整的脚本:
go
// Cloudflare Turnstile 解决器 - 完整示例
// 使用方法:go run main.go
// 需要:CAPSOLVER_API_KEY 环境变量
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"os"
"regexp"
"strings"
"time"
"github.com/go-rod/rod"
"github.com/go-rod/rod/lib/launcher"
)
// 配置
var (
CAPSOLVER_API_KEY = os.Getenv("CAPSOLVER_API_KEY")
CAPSOLVER_API = "https://api.capsolver.com"
)
// API 响应结构
type CreateTaskResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
TaskID string `json:"taskId"`
}
type GetTaskResultResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
Status string `json:"status"`
Solution struct {
Token string `json:"token"`
} `json:"solution"`
}
type BalanceResponse struct {
ErrorID int `json:"errorId"`
Balance float64 `json:"balance"`
}
// CapsolverClient 处理 API 通信
type CapsolverClient struct {
APIKey string
Client *http.Client
}
// NewCapsolverClient 创建新的 Capsolver 客户端
func NewCapsolverClient(apiKey string) *CapsolverClient {
return &CapsolverClient{
APIKey: apiKey,
Client: &http.Client{Timeout: 120 * time.Second},
}
}
// GetBalance 获取账户余额
func (c *CapsolverClient) GetBalance() (float64, error) {
payload := map[string]string{"clientKey": c.APIKey}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getBalance", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return 0, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result BalanceResponse
json.Unmarshal(body, &result)
return result.Balance, nil
}
// SolveTurnstile 解决 Cloudflare Turnstile 挑战
func (c *CapsolverClient) SolveTurnstile(websiteURL, siteKey string) (string, error) {
log.Printf("为 %s 创建 Turnstile 任务", websiteURL)
// 创建任务
task := map[string]interface{}{
"type": "AntiTurnstileTaskProxyLess",
"websiteURL": websiteURL,
"websiteKey": siteKey,
}
payload := map[string]interface{}{
"clientKey": c.APIKey,
"task": task,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/createTask", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return "", fmt.Errorf("创建任务失败: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var createResult CreateTaskResponse
json.Unmarshal(body, &createResult)
if createResult.ErrorID != 0 {
return "", fmt.Errorf("API 错误: %s - %s", createResult.ErrorCode, createResult.ErrorDescription)
}
log.Printf("任务已创建: %s", createResult.TaskID)
// 轮询结果
for i := 0; i < 120; i++ {
result, err := c.getTaskResult(createResult.TaskID)
if err != nil {
return "", err
}
if result.Status == "ready" {
log.Printf("CAPTCHA 成功解决!")
return result.Solution.Token, nil
}
if result.Status == "failed" {
return "", fmt.Errorf("任务失败: %s", result.ErrorDescription)
}
if i%10 == 0 {
log.Printf("等待解决方案... (%ds)", i)
}
time.Sleep(1 * time.Second)
}
return "", fmt.Errorf("等待解决方案超时")
}
func (c *CapsolverClient) getTaskResult(taskID string) (*GetTaskResultResponse, error) {
payload := map[string]string{
"clientKey": c.APIKey,
"taskId": taskID,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getTaskResult", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return nil, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result GetTaskResultResponse
json.Unmarshal(body, &result)
return &result, nil
}
// extractTurnstileSiteKey 从页面 HTML 中提取 Cloudflare Turnstile site key
func extractTurnstileSiteKey(html string) string {
// 查找 data-sitekey 属性
patterns := []string{
`data-sitekey="`,
`data-sitekey='`,
`"sitekey":"`,
`'sitekey':'`,
}
for _, pattern := range patterns {
if idx := strings.Index(html, pattern); idx != -1 {
start := idx + len(pattern)
end := start
for end < len(html) && html[end] != '"' && html[end] != '\'' {
end++
}
if end > start {
return html[start:end]
}
}
}
return ""
}
// injectTurnstileToken 将解决的令牌注入页面
func injectTurnstileToken(page *rod.Page, token string) error {
js := fmt.Sprintf(`
(function() {
// 设置 Turnstile 令牌
var tokenField = document.querySelector('input[name="cf-turnstile-response"]');
if (tokenField) {
tokenField.value = '%s';
}
// 触发回调函数
if (typeof window.__cfturnstileCallbacks !== 'undefined') {
for _, cb in window.__cfturnstileCallbacks {
if (typeof cb === 'function') {
cb('%s');
}
}
}
return true;
})();
`, token, token)
_, err := page.Eval(js)
return err
}
func main() {
// 检查 API 密钥
if CAPSOLVER_API_KEY == "" {
log.Fatal("CAPSOLVER_API_KEY :// 需要设置环境变量")
}
// 目标 URL - Cloudflare Turnstile 演示页面
targetURL := "https://cf-turnstile.demo.cloudflare.com"
log.Println("==============================================")
log.Println("Katana + Capsolver - Cloudflare Turnstile 演示")
log.Println("==============================================")
// 初始化 Capsolver 客户端
client := NewCapsolverClient(CAPSOLVER_API_KEY)
// 检查余额
balance, err := client.GetBalance()
if err != nil {
log.Printf("警告: 无法检查余额: %v", err)
} else {
log.Printf("Capsolver 余额: $%.2f", balance)
}
// 启动浏览器
log.Println("启动浏览器...")
path, _ := launcher.LookPath()
u := launcher.New().Bin(path).Headless(true).MustLaunch()
browser := rod.New().ControlURL(u).MustConnect()
defer browser.MustClose()
// 导航到目标
log.Printf("导航到: %s", targetURL)
page := browser.MustPage(targetURL)
page.MustWaitLoad()
time.Sleep(2 * time.Second)
// 获取页面 HTML 并提取 site key
html := page.MustHTML()
// 检查 Cloudflare Turnstile
if !strings.Contains(html, "cf-turnstile") {
log.Fatal("页面上未找到 Cloudflare Turnstile")
}
log.Println("Cloudflare Turnstile 已检测到!")
// 提取 site key
siteKey := extractTurnstileSiteKey(html)
if siteKey == "" {
log.Fatal("无法提取 site key")
}
log.Printf("Site key: %s", siteKey)
// 解决 CAPTCHA
log.Println("使用 Capsolver 解决 CAPTCHA...")
token, err := client.SolveTurnstile(targetURL, siteKey)
if err != nil {
log.Fatalf("无法解决 CAPTCHA: %v", err)
}
log.Printf("收到令牌: %s...", token[:50])
// 注入令牌
log.Println("将令牌注入页面...")
err = injectTurnstileToken(page, token)
if err != nil {
log.Fatalf("注入令牌失败: %v", err)
}
// 提交表单
log.Println("提交表单...")
submitBtn := page.MustElement("#turnstile-demo-submit")
submitBtn.MustClick()
// 等待结果
time.Sleep(3 * time.Second)
// 检查结果
newHTML := page.MustHTML()
if strings.Contains(newHTML, "Verification Success") || strings.Contains(newHTML, "success") {
log.Println("==============================================")
log.Println("成功!Cloudflare Turnstile 已解决并验证!")
log.Println("==============================================")
} else {
log.Println("表单已提交 - 检查页面以查看结果")
}
// 获取页面标题
title := page.MustEval(`document.title`).String()
log.Printf("最终页面标题: %s", title)
}
"websiteKey": siteKey,
}
payload := map[string]interface{}{
"clientKey": c.APIKey,
"task": task,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/createTask", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return "", fmt.Errorf("failed to create task: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var createResult CreateTaskResponse
json.Unmarshal(body, &createResult)
if createResult.ErrorID != 0 {
return "", fmt.Errorf("API error: %s - %s", createResult.ErrorCode, createResult.ErrorDescription)
}
log.Printf("Task created: %s", createResult.TaskID)
// Poll for result
for i := 0; i < 120; i++ {
result, err := c.getTaskResult(createResult.TaskID)
if err != nil {
return "", err
}
if result.Status == "ready" {
log.Printf("Turnstile solved successfully!")
return result.Solution.Token, nil
}
if result.Status == "failed" {
return "", fmt.Errorf("task failed: %s", result.ErrorDescription)
}
if i%10 == 0 {
log.Printf("Waiting for solution... (%ds)", i)
}
time.Sleep(1 * time.Second)
}
return "", fmt.Errorf("timeout waiting for solution")
}
func (c *CapsolverClient) getTaskResult(taskID string) (*GetTaskResultResponse, error) {
payload := map[string]string{
"clientKey": c.APIKey,
"taskId": taskID,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getTaskResult", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return nil, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result GetTaskResultResponse
json.Unmarshal(body, &result)
return &result, nil
}
// extractTurnstileSiteKey extracts the Turnstile site key from page HTML
func extractTurnstileSiteKey(html string) string {
// Pattern 1: data-sitekey attribute on cf-turnstile div
patterns := []string{
cf-turnstile[^>]*data-sitekey=['"]([^'"]+)['"],
data-sitekey=['"]([^'"]+)['"][^>]*class=['"][^'"]*cf-turnstile,
turnstile\.render\s*\([^,]+,\s*\{[^}]*sitekey['":\s]+['"]([^'"]+)['"],
sitekey['":\s]+['"]([0-9a-zA-Z_-]+)['"],
}
for _, pattern := range patterns {
re := regexp.MustCompile(pattern)
matches := re.FindStringSubmatch(html)
if len(matches) > 1 {
return matches[1]
}
}
return ""
}
// injectTurnstileToken injects the solved token into the page
func injectTurnstileToken(page *rod.Page, token string) error {
js := fmt.Sprintf(`
(function() {
// Set the cf-turnstile-response field
var responseField = document.querySelector('[name="cf-turnstile-response"]');
if (responseField) {
responseField.value = '%s';
}
// Also try to find by ID
var byId = document.getElementById('cf-turnstile-response');
if (byId) {
byId.value = '%s';
}
// Create hidden input if needed
if (!responseField && !byId) {
var input = document.createElement('input');
input.type = 'hidden';
input.name = 'cf-turnstile-response';
input.value = '%s';
var form = document.querySelector('form');
if (form) {
form.appendChild(input);
}
}
// Try to trigger callback
if (window.turnstile && window.turnstileCallback) {
window.turnstileCallback('%s');
}
return true;
})();
`, token, token, token, token)
_, err := page.Eval(js)
return err
}
func main() {
// Check API key
if CAPSOLVER_API_KEY == "" {
log.Fatal("CAPSOLVER_API_KEY environment variable is required")
}
// Target URL - Replace with a site using Cloudflare Turnstile
targetURL := "https://example.com"
log.Println("==============================================")
log.Println("Katana + Capsolver - Turnstile Demo")
log.Println("==============================================")
// Initialize Capsolver client
client := NewCapsolverClient(CAPSOLVER_API_KEY)
// Check balance
balance, err := client.GetBalance()
if err != nil {
log.Printf("Warning: Could not check balance: %v", err)
} else {
log.Printf("Capsolver balance: $%.2f", balance)
}
// Launch browser
log.Println("Launching browser...")
path, _ := launcher.LookPath()
u := launcher.New().Bin(path).Headless(true).MustLaunch()
browser := rod.New().ControlURL(u).MustConnect()
defer browser.MustClose()
// Navigate to target
log.Printf("Navigating to: %s", targetURL)
page := browser.MustPage(targetURL)
page.MustWaitLoad()
time.Sleep(2 * time.Second)
// Get page HTML
html := page.MustHTML()
// Check for Turnstile
if !strings.Contains(html, "cf-turnstile") && !strings.Contains(html, "turnstile") {
log.Println("No Turnstile found on page")
log.Println("Tip: Replace targetURL with a site that uses Cloudflare Turnstile")
return
}
log.Println("Cloudflare Turnstile detected!")
// Extract site key
siteKey := extractTurnstileSiteKey(html)
if siteKey == "" {
log.Fatal("Could not extract site key")
}
log.Printf("Site key: %s", siteKey)
// Solve Turnstile
log.Println("Solving Turnstile with Capsolver...")
token, err := client.SolveTurnstile(targetURL, siteKey)
if err != nil {
log.Fatalf("Failed to solve Turnstile: %v", err)
}
log.Printf("Token received: %s...", token[:min(50, len(token))])
// Inject token
log.Println("Injecting token into page...")
err = injectTurnstileToken(page, token)
if err != nil {
log.Fatalf("Failed to inject token: %v", err)
}
log.Println("==============================================")
log.Println("SUCCESS! Turnstile token injected!")
log.Println("==============================================")
// Get page title
title := page.MustEval(`document.title`).String()
log.Printf("Page title: %s", title)
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
### Turnstile Key Points
1. **Task Type**: Use `AntiTurnstileTaskProxyLess`
2. **Response Field**: Turnstile uses `cf-turnstile-response` instead of `g-recaptcha-response`
3. **Faster Solving**: Turnstile typically solves faster than reCAPTCHA (1-10 seconds)
4. **Token Field**: Solution is in `solution.token` instead of `solution.gRecaptchaResponse`
---
## Universal CAPTCHA Crawler
Here's a complete, modular crawler that handles all CAPTCHA types automatically:
```go
// Universal CAPTCHA Crawler - Complete Example
// Automatically detects and solves reCAPTCHA v2 and Turnstile
// Usage: go run main.go -url "https://example.com"
// Requires: CAPSOLVER_API_KEY environment variable
package main
import (
"bytes"
"encoding/json"
"flag"
"fmt"
"io"
"log"
"net/http"
"os"
"regexp"
"strings"
"time"
"github.com/go-rod/rod"
"github.com/go-rod/rod/lib/launcher"
)
// ============================================
// Configuration
// ============================================
var (
CAPSOLVER_API_KEY = os.Getenv("CAPSOLVER_API_KEY")
CAPSOLVER_API = "https://api.capsolver.com"
)
// CaptchaType represents different CAPTCHA types
type CaptchaType string
const (
RecaptchaV2 CaptchaType = "recaptcha_v2"
Turnstile CaptchaType = "turnstile"
Unknown CaptchaType = "unknown"
)
// CaptchaInfo contains extracted CAPTCHA parameters
type CaptchaInfo struct {
Type CaptchaType
SiteKey string
}
// ============================================
// API Types
// ============================================
type CreateTaskResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
TaskID string `json:"taskId"`
}
type GetTaskResultResponse struct {
ErrorID int `json:"errorId"`
ErrorCode string `json:"errorCode"`
ErrorDescription string `json:"errorDescription"`
Status string `json:"status"`
Solution struct {
GRecaptchaResponse string `json:"gRecaptchaResponse"`
Token string `json:"token"`
} `json:"solution"`
}
type BalanceResponse struct {
ErrorID int `json:"errorId"`
Balance float64 `json:"balance"`
}
// ============================================
// Capsolver Client
// ============================================
type CapsolverClient struct {
APIKey string
Client *http.Client
}
func NewCapsolverClient(apiKey string) *CapsolverClient {
return &CapsolverClient{
APIKey: apiKey,
Client: &http.Client{Timeout: 120 * time.Second},
}
}
func (c *CapsolverClient) GetBalance() (float64, error) {
payload := map[string]string{"clientKey": c.APIKey}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getBalance", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return 0, err
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var result BalanceResponse
json.Unmarshal(body, &result)
return result.Balance, nil
}
func (c *CapsolverClient) Solve(info *CaptchaInfo, websiteURL string) (string, error) {
switch info.Type {
case RecaptchaV2:
return c.solveRecaptchaV2(websiteURL, info.SiteKey)
case Turnstile:
return c.solveTurnstile(websiteURL, info.SiteKey)
default:
return "", fmt.Errorf("unsupported CAPTCHA type: %s", info.Type)
}
}
func (c *CapsolverClient) solveRecaptchaV2(websiteURL, siteKey string) (string, error) {
task := map[string]interface{}{
"type": "ReCaptchaV2TaskProxyLess",
"websiteURL": websiteURL,
"websiteKey": siteKey,
}
return c.solveTask(task, "recaptcha")
}
func (c *CapsolverClient) solveTurnstile(websiteURL, siteKey string) (string, error) {
task := map[string]interface{}{
"type": "AntiTurnstileTaskProxyLess",
"websiteURL": websiteURL,
"websiteKey": siteKey,
}
return c.solveTask(task, "turnstile")
}
func (c *CapsolverClient) solveTask(task map[string]interface{}, tokenType string) (string, error) {
// Create task
payload := map[string]interface{}{
"clientKey": c.APIKey,
"task": task,
}
jsonData, _ := json.Marshal(payload)
resp, err := c.Client.Post(CAPSOLVER_API+"/createTask", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return "", fmt.Errorf("failed to create task: %w", err)
}
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
var createResult CreateTaskResponse
json.Unmarshal(body, &createResult)
if createResult.ErrorID != 0 {
return "", fmt.Errorf("API error: %s - %s", createResult.ErrorCode, createResult.ErrorDescription)
}
log.Printf("Task created: %s", createResult.TaskID)
// Poll for result
for i := 0; i < 120; i++ {
getPayload := map[string]string{
"clientKey": c.APIKey,
"taskId": createResult.TaskID,
}
jsonData, _ := json.Marshal(getPayload)
resp, err := c.Client.Post(CAPSOLVER_API+"/getTaskResult", "application/json", bytes.NewBuffer(jsonData))
if err != nil {
return "", err
}
body, _ := io.ReadAll(resp.Body)
resp.Body.Close()
var result GetTaskResultResponse
json.Unmarshal(body, &result)
if result.Status == "ready" {
if tokenType == "turnstile" {
return result.Solution.Token, nil
}
return result.Solution.GRecaptchaResponse, nil
}
if result.Status == "failed" {
return "", fmt.Errorf("task failed: %s", result.ErrorDescription)
}
if i%10 == 0 {
log.Printf("Waiting for solution... (%ds)", i)
}
time.Sleep(1 * time.Second)
}
return "", fmt.Errorf("timeout waiting for solution")
}
// ============================================
// CAPTCHA Detection
// ============================================
func DetectCaptcha(html string) *CaptchaInfo {
// Check for reCAPTCHA v2 (checkbox)
if strings.Contains(html, "g-recaptcha") {
siteKey := extractDataSiteKey(html, "g-recaptcha")
if siteKey != "" {
return &CaptchaInfo{
Type: RecaptchaV2,
SiteKey: siteKey,
}
}
}
// Check for Cloudflare Turnstile
if strings.Contains(html, "cf-turnstile") || strings.Contains(html, "challenges.cloudflare.com/turnstile") {
siteKey := extractDataSiteKey(html, "cf-turnstile")
if siteKey != "" {
return &CaptchaInfo{
Type: Turnstile,
SiteKey: siteKey,
}
}
}
return nil
}
func extractDataSiteKey(html, className string) string {
pattern := fmt.Sprintf(`class=['"][^'"]*%s[^'"]*['"][^>]*data-sitekey=['"]([^'"]+)['"]`, className)
re := regexp.MustCompile(pattern)
matches := re.FindStringSubmatch(html)
if len(matches) > 1 {
return matches[1]
}
// Alternative pattern
pattern = fmt.Sprintf(`data-sitekey=['"]([^'"]+)['"][^>]*class=['"][^'"]*%s`, className)
re = regexp.MustCompile(pattern)
matches = re.FindStringSubmatch(html)
if len(matches) > 1 {
return matches[1]
}
// Generic sitekey pattern
re = regexp.MustCompile(`data-sitekey=['"]([^'"]+)['"]`)
matches = re.FindStringSubmatch(html)
if len(matches) > 1 {
return matches[1]
}
return ""
}
// ============================================
// Token Injection
// ============================================
func InjectToken(page *rod.Page, token string, captchaType CaptchaType) error {
var js string
switch captchaType {
case RecaptchaV2:
js = fmt.Sprintf(`
(function() {
var responseField = document.getElementById('g-recaptcha-response');
if (responseField) {
responseField.style.display = 'block';
responseField.value = '%s';
}
var textareas = document.querySelectorAll('textarea[name="g-recaptcha-response"]');
for (var i = 0; i < textareas.length; i++) {
textareas[i].value = '%s';
}
if (typeof ___grecaptcha_cfg !== 'undefined') {
var clients = ___grecaptcha_cfg.clients;
for (var key in clients) {
var client = clients[key];
if (client) {
try {
var callback = client.callback ||
(client.Q && client.Q.callback) ||
(client.S && client.S.callback);
if (typeof callback === 'function') {
callback('%s');
}
} catch(e) {}
}
}
}
return true;
})();
`, token, token, token)
case Turnstile:
js = fmt.Sprintf(`
(function() {
var responseField = document.querySelector('[name="cf-turnstile-response"]');
if (responseField) {
responseField.value = '%s';
}
if (!responseField) {
var input = document.createElement('input');
input.type = 'hidden';
input.name = 'cf-turnstile-response';
input.value = '%s';
var form = document.querySelector('form');
if (form) form.appendChild(input);
}
if (window.turnstile && window.turnstileCallback) {
window.turnstileCallback('%s');
}
return true;
})();
`, token, token, token)
default:
return fmt.Errorf("unsupported CAPTCHA type: %s", captchaType)
}
_, err := page.Eval(js)
return err
}
// ============================================
// Crawler
// ============================================
type CrawlResult struct {
URL string
Title string
Success bool
CaptchaFound bool
CaptchaType CaptchaType
CaptchaSolved bool
Error string
}
func Crawl(browser *rod.Browser, client *CapsolverClient, targetURL string) *CrawlResult {
result := &CrawlResult{
URL: targetURL,
成功: false,
}
// 导航到目标页面
log.Printf("导航到: %s", targetURL)
page := browser.MustPage(targetURL)
defer page.MustClose()
page.MustWaitLoad()
time.Sleep(2 * time.Second)
// 获取页面HTML
html := page.MustHTML()
// 检测CAPTCHA
captchaInfo := DetectCaptcha(html)
if captchaInfo != nil && captchaInfo.Type != Unknown {
result.CaptchaFound = true
result.CaptchaType = captchaInfo.Type
log.Printf("检测到CAPTCHA: %s (siteKey: %s)", captchaInfo.Type, captchaInfo.SiteKey)
// 解决CAPTCHA
log.Println("使用Capsolver解决CAPTCHA...")
token, err := client.Solve(captchaInfo, targetURL)
if err != nil {
result.Error = fmt.Sprintf("无法解决CAPTCHA: %v", err)
log.Printf("错误: %s", result.Error)
return result
}
log.Printf("收到令牌: %s...", token[:min(50, len(token))])
// 注入令牌
log.Println("注入令牌...")
err = InjectToken(page, token, captchaInfo.Type)
if err != nil {
result.Error = fmt.Sprintf("注入令牌失败: %v", err)
log.Printf("错误: %s", result.Error)
return result
}
result.CaptchaSolved = true
log.Println("令牌注入成功!")
// 尝试提交表单
submitForm(page)
time.Sleep(3 * time.Second)
} else {
log.Println("页面未检测到CAPTCHA")
}
// 获取最终页面信息
result.Title = page.MustEval(`document.title`).String()
result.Success = true
return result
}
func submitForm(page *rod.Page) {
selectors := []string{
"button[type='submit']",
"input[type='submit']",
"#recaptcha-demo-submit",
".submit-button",
}
for _, selector := range selectors {
js := fmt.Sprintf(`
(function() {
var btn = document.querySelector('%s');
if (btn && btn.offsetParent !== null) {
btn.click();
return true;
}
return false;
})();
`, selector)
result := page.MustEval(js)
if result.Bool() {
log.Printf("点击提交按钮: %s", selector)
return
}
}
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
// ============================================
// 主程序
// ============================================
func main() {
// 解析标志
targetURL := flag.String("url", "https://www.google.com/recaptcha/api2/demo", "要爬取的目标URL")
headless := flag.Bool("headless", true, "以无头模式运行浏览器")
checkBalance := flag.Bool("balance", false, "仅检查账户余额")
flag.Parse()
// 检查API密钥
if CAPSOLVER_API_KEY == "" {
log.Fatal("需要CAPSOLVER_API_KEY环境变量")
}
log.Println("==============================================")
log.Println("Katana + Capsolver - 通用CAPTCHA爬虫")
log.Println("==============================================")
// 初始化客户端
client := NewCapsolverClient(CAPSOLVER_API_KEY)
// 检查余额
balance, err := client.GetBalance()
if err != nil {
log.Printf("警告: 无法检查余额: %v", err)
} else {
log.Printf("Capsolver余额: $%.2f", balance)
}
if *checkBalance {
return
}
// 启动浏览器
log.Println("启动浏览器...")
path, _ := launcher.LookPath()
u := launcher.New().Bin(path).Headless(*headless).MustLaunch()
browser := rod.New().ControlURL(u).MustConnect()
defer browser.MustClose()
// 爬取
result := Crawl(browser, client, *targetURL)
// 输出结果
log.Println("==============================================")
log.Println("爬取结果")
log.Println("==============================================")
log.Printf("URL: %s", result.URL)
log.Printf("标题: %s", result.Title)
log.Printf("成功: %v", result.Success)
log.Printf("检测到CAPTCHA: %v", result.CaptchaFound)
if result.CaptchaFound {
log.Printf("CAPTCHA类型: %s", result.CaptchaType)
log.Printf("CAPTCHA已解决: %v", result.CaptchaSolved)
}
if result.Error != "" {
log.Printf("错误: %s", result.Error)
}
log.Println("==============================================")
}
使用方法
bash
# 创建项目
mkdir katana-universal-crawler
cd katana-universal-crawler
go mod init katana-universal-crawler
# 安装依赖
go get github.com/go-rod/rod@latest
# 设置API密钥
export CAPSOLVER_API_KEY="您的API密钥"
# 运行默认(reCAPTCHA v2演示)
go run main.go
# 使用自定义URL运行
go run main.go -url "https://example.com"
# 仅检查余额
go run main.go -balance
# 使用可见浏览器运行
go run main.go -headless=false
最佳实践
1. 性能优化
- 使用无代理任务类型:
ReCaptchaV2TaskProxyLess使用 Capsolver 的内部代理以加快解决速度 - 并行处理: 在其他页面元素加载时开始解决 CAPTCHA
- 令牌缓存: reCAPTCHA 令牌有效期为约 2 分钟;可能时进行缓存
2. 成本管理
- 检测后再解决: 仅在实际存在 CAPTCHA 时调用 Capsolver
- 验证站点密钥: 在 API 调用前确保提取的密钥有效
- 监控使用情况: 跟踪 API 调用以有效管理成本
3. 错误处理
go
func SolveWithRetry(client *CapsolverClient, info *CaptchaInfo, url string, maxRetries int) (string, error) {
var lastErr error
for i := 0; i < maxRetries; i++ {
token, err := client.Solve(info, url)
if err == nil {
return token, nil
}
lastErr = err
log.Printf("尝试 %d 失败: %v", i+1, err)
// 指数退避
time.Sleep(time.Duration(i+1) * time.Second)
}
return "", fmt.Errorf("尝试 %d 次后失败: %w", maxRetries, lastErr)
}
4. 速率限制
实现适当的请求间隔以避免被检测:
go
type RateLimiter struct {
requests int
interval time.Duration
lastRequest time.Time
mu sync.Mutex
}
func (r *RateLimiter) Wait() {
r.mu.Lock()
defer r.mu.Unlock()
elapsed := time.Since(r.lastRequest)
if elapsed < r.interval {
time.Sleep(r.interval - elapsed)
}
r.lastRequest = time.Now()
}
故障排除
常见错误
| 错误 | 原因 | 解决方案 |
|---|---|---|
ERROR_ZERO_BALANCE |
信用不足 | 充值 Capsolver 账户 |
ERROR_CAPTCHA_UNSOLVABLE |
无效的站点密钥 | 验证提取逻辑 |
ERROR_INVALID_TASK_DATA |
缺少参数 | 检查任务结构体 |
context deadline exceeded |
超时 | 增加超时或检查网络 |
调试提示
- 启用可见浏览器: 设置
Headless(false)以查看发生了什么 - 记录网络流量: 监控请求以识别问题
- 保存截图: 捕获页面状态以供调试
- 验证令牌: 在注入前记录令牌格式
常见问题
Q: 我可以在不使用无头模式的情况下使用 Katana 处理 CAPTCHA 页面吗?
A: 不可以,CAPTCHA 页面需要 JavaScript 渲染,这仅在无头模式下有效。
Q: CAPTCHA 令牌的有效期是多久?
A: reCAPTCHA 令牌: 约 2 分钟。Turnstile: 根据配置而定。
Q: 平均解决时间是多少?
A: reCAPTCHA v2: 5-15 秒,Turnstile: 1-10 秒。
Q: 我可以使用自己的代理吗?
A: 可以,使用不带 "ProxyLess" 后缀的任务类型并提供代理配置。
结论
将 Capsolver 与 Katana 集成可为您的网络爬虫需求提供强大的 CAPTCHA 处理能力。上述完整脚本可以直接复制并用于您的 Go 项目中。
准备好了吗? 在 Capsolver 注册 并提升您的爬虫性能!
💡 Katana 集成用户的专属优惠:
为庆祝此次集成,我们为通过本教程注册的 Capsolver 用户提供独家 6% 优惠码 — Katana。只需在仪表板中充值时输入该代码,即可立即获得额外 6% 的信用额度。
12. 文档
- 12.1. Katana GitHub 仓库
- 12.2. Katana 文档
- 12.3. Capsolver 文档
- 12.4. Go Rod 浏览器自动化
合规声明: 本博客提供的信息仅供参考。CapSolver 致力于遵守所有适用的法律和法规。严禁以非法、欺诈或滥用活动使用 CapSolver 网络,任何此类行为将受到调查。我们的验证码解决方案在确保 100% 合规的同时,帮助解决公共数据爬取过程中的验证码难题。我们鼓励负责任地使用我们的服务。如需更多信息,请访问我们的服务条款和隐私政策。
更多

2026年顶级Python网络爬虫库
探索2026年最佳的Python网络爬虫库,比较其功能、易用性和性能,以满足您的数据提取需求。包含专家见解和常见问题解答。

Lucas Mitchell
09-Jan-2026

将Katana与CapSolver集成:用于网络爬虫的自动CAPTCHA解决
学习如何将Katana与Capsolver集成,以在无头爬虫中自动解决reCAPTCHA v2和Cloudflare Turnstile。

Emma Foster
09-Jan-2026

将Crawlab与CapSolver集成:用于分布式爬虫的自动验证码解决
学习如何将 CapSolver 与 Crawlab 集成,以规模化解决 reCAPTCHA 和 Cloudflare Turnstile。

Emma Foster
09-Jan-2026

2026年你必须知道的最佳AI抓取工具
发现2026年最好的AI抓取工具选项。我们对比了顶级的AI网络抓取工具,包括Bright Data、Crawl4AI和Browse AI,并提供了具体的常见用途,帮助您掌握自动化数据提取和安全挑战解决。

Nikolai Smirnov
07-Jan-2026

6款最佳网页解封工具对比:2026年最佳选择
比较2026年排名前六的网页解封工具。了解顶级的网页解封API,如Decodo、Oxylabs和Bright Data,用于绕过反机器人系统、住宅代理和自动化抓取工具。

Emma Foster
07-Jan-2026

2026年最佳另类数据供应商(顶级平台对比)
探索2026年最佳另类数据供应商。我们的指南比较了顶级平台(YipitData、FactSet、Preqin)的优缺点及定价信息,用于合规和生成超额收益。

Emma Foster
06-Jan-2026


