CAPSOLVER
Blog
How to Solve Visual Puzzles in n8n with CapSolver Vision Engine

How to Solve Visual Puzzles in n8n with CapSolver

Logo of CapSolver

Ethan Collins

Pattern Recognition Specialist

18-Mar-2026

Visual puzzles are everywhere: slider CAPTCHAs that require dragging a piece to the correct position, rotation challenges where you align an image, object-selection grids, and animated GIF text recognition. These are not traditional text-based CAPTCHAs and they are not the token-based challenges (like reCAPTCHA or Turnstile) that return a string you submit with a form. They are image-based visual challenges where the input is a picture and the output is a measurement โ€” a distance in pixels, an angle in degrees, a set of coordinates, or recognized text.

That is what CapSolver's Vision Engine solves. It uses AI to analyze the visual puzzle image and return the precise answer your automation needs to proceed.

In this guide, you will learn how to use the Vision Engine in n8n through the CapSolver community node. The walkthrough covers the core Solver API workflow and a practical Slider Puzzle Solver that fetches puzzle images, converts them to base64, solves the slider, and returns the distance in pixels.

Important: Vision Engine is a Recognition operation, not a Token operation. That means the result comes back instantly in a single API call โ€” there is no polling, no getTaskResult loop, and no timeout waiting. You send the image, you get the answer.


How Vision Engine Differs from Other CapSolver Operations

Most CapSolver operations in n8n are Token tasks. You submit site parameters (URL, site key, proxy), CapSolver solves the challenge in the background, and your workflow polls for the result. The output is a token string that you then submit to the target site.

Vision Engine works differently:

Aspect Token Operations (reCAPTCHA, Turnstile, etc.) Vision Engine (Recognition)
Resource Token Recognition
Input Website URL, site key, proxy Base64 image(s), module name
Processing Async โ€” poll for result Instant โ€” single API call
Output Token string Pixels, degrees, coordinates, or text
Proxy Often required Not needed
Use case Submit token to bypass challenge gate Interpret visual puzzle to automate interaction

Vision Engine is closer to Image To Text (OCR) than to reCAPTCHA solving, but it goes beyond simple text recognition. Where Image To Text reads characters from a static image, Vision Engine understands spatial relationships โ€” it can calculate how far to drag a slider piece, what angle to rotate an image, which areas of a picture match a question, or what text is hidden in an animated GIF.


Available Modules

Vision Engine supports multiple AI models, each designed for a specific type of visual puzzle:

Module Purpose Input Returns
slider_1 Slider puzzle solving image (puzzle piece) + imageBackground (background with slot) Distance in pixels
rotate_1 Single image rotation image + imageBackground Angle in degrees
rotate_2 Multi-image rotation (inner + outer) image (inner image) Angle in degrees
shein Object/area selection image + question (what to select) rects array โ€” bounding boxes [{x1, y1, x2, y2}]
ocr_gif Animated GIF text recognition image (base64 of the GIF) Recognized text string

When to Use Each Module

slider_1 โ€” The most common visual CAPTCHA type. The user sees a background image with a missing piece and a separate puzzle piece. The goal is to determine how many pixels to the right the piece must be dragged. Both image (the puzzle piece) and imageBackground (the full background with the slot) are required.

rotate_1 โ€” A single image that must be rotated to the correct orientation. Both image and imageBackground are required. The engine returns the angle in degrees.

rotate_2 โ€” Two concentric images (an inner image and an outer ring). The inner image must be rotated to align with the outer. Only image is needed. The engine returns the angle.

shein โ€” Used for challenges that ask "select the matching items" or "tap the correct area." Requires image plus a question parameter describing what to find. Returns bounding-box coordinates for each matching area.

ocr_gif โ€” Animated GIFs where text flashes across frames, making it unreadable for standard OCR. The engine analyzes the animation and extracts the text.


Prerequisites

Before you start, make sure you have:

  1. An n8n instance (self-hosted or cloud)
  2. A CapSolver account with API key and balance โ€” sign up here
  3. The CapSolver community node installed in n8n (n8n-nodes-capsolver)
  4. A configured CapSolver credential in n8n (Settings > Credentials > CapSolver API)

No proxy is required for Vision Engine tasks.


CapSolver Node Settings for Vision Engine

In the n8n CapSolver node, configure these settings:

Setting Value
Resource Recognition
Operation Vision Engine
module The model name (e.g., slider_1, rotate_1, ocr_gif)
image Base64-encoded image string (no data:image/...;base64, prefix)
imageBackground Base64-encoded background image (optional โ€” required for slider_1 and rotate_1)
question Text question (optional โ€” required only for shein module)
websiteURL Source page URL (optional โ€” can improve accuracy)

The type field is automatically set to VisionEngine when you select the Vision Engine operation.

Base64 Image Requirements

The image and imageBackground fields must be raw base64 strings โ€” no data URI prefix, no newlines:

  • Correct: /9j/4AAQSkZJRgABA... (raw base64)
  • Wrong: data:image/jpeg;base64,/9j/4AAQSkZJRgABA... (has prefix)

If your source image is a URL, you must fetch it first and convert it to base64. If it already has the data:image/...;base64, prefix, strip it before passing it to the CapSolver node.


Workflow 1: Vision Engine โ€” Solver API

This workflow exposes Vision Engine as a simple REST API endpoint. Send a POST request with the module name and base64 image(s), and get the solution back as JSON.

Node Flow

Copy
Receive Solver Request (Webhook POST)
  โ†’ Validate Input (Code)
    โ†’ Solve Visual Puzzle (CapSolver โ€” Recognition โ€” Vision Engine)
      โ†’ Vision Engine Error? (IF)
        โ†’ true:  Respond to Webhook Error
        โ†’ false: Respond to Webhook (Success)

How It Works

1. Receive Solver Request

A webhook endpoint accepts POST requests with a JSON body containing:

json Copy
{
  "module": "slider_1",
  "image": "/9j/4AAQSkZJRgABA...",
  "imageBackground": "/9j/4AAQSkZJRgABA...",
  "question": "",
  "websiteURL": ""
}

2. Validate Input

The Code node checks that image exists and that module is one of the supported values (slider_1, rotate_1, rotate_2, shein, ocr_gif). If validation fails, it sets an error field.

3. Solve Visual Puzzle

The CapSolver node is configured with:

  • Resource: Recognition
  • Operation: Vision Engine
  • module: from the request body
  • image: from the request body
  • imageBackground: from the request body (empty string if not provided)
  • question: from the request body (empty string if not provided)

Since this is a Recognition task, the result returns instantly.

4. Error Handling

The IF node checks for errors. If the CapSolver node returned an error (wrong module, invalid image, etc.), the error response webhook fires. Otherwise, the success response returns the solution.

Expected Request and Response

Slider puzzle request:

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "slider_1",
    "image": "BASE64_PUZZLE_PIECE",
    "imageBackground": "BASE64_BACKGROUND"
  }'

Success response:

json Copy
{
  "solution": {
    "distance": 142,
    "module": "slider_1"
  }
}

GIF OCR request:

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "ocr_gif",
    "image": "BASE64_GIF_DATA"
  }'

Success response:

json Copy
{
  "solution": {
    "text": "x7Km9",
    "module": "ocr_gif"
  }
}

Import This Workflow

Click to expand workflow JSON
json Copy
{
  "name": "Vision Engine โ€” Solver API",
  "nodes": [
    {
      "parameters": {
        "content": "## Vision Engine โ€” Solver API\n\n**Who it's for:** Developers and automation teams that need to solve visual puzzles (sliders, rotations, object selection, GIF OCR) via a simple REST endpoint.\n\n**What it does:** Accepts a base64-encoded image and a module name, sends it to CapSolver's Vision Engine, and returns the solution instantly.\n\n**How it works:**\n1. Webhook receives POST with `module`, `image`, and optional `imageBackground` / `question`\n2. Code node validates the input (image exists, module is valid)\n3. CapSolver Recognition node solves the visual puzzle\n4. Returns solution or error as JSON\n\n**Setup:**\n1. Add your CapSolver API key under **Settings โ†’ Credentials**\n2. Activate the workflow\n3. POST to `/webhook/vision-engine-solver` with your image data",
        "height": 560,
        "width": 460,
        "color": 1
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-920, -380],
      "id": "sticky-ve-main-001",
      "name": "Sticky Note"
    },
    {
      "parameters": {
        "content": "### Input Validation\nChecks that `image` is present and `module` is one of: slider_1, rotate_1, rotate_2, shein, ocr_gif",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-100, -280],
      "id": "sticky-ve-section-002",
      "name": "Sticky Note1"
    },
    {
      "parameters": {
        "content": "### CapSolver Vision Engine\nRecognition resource โ€” instant result, no polling. Returns distance, angle, coordinates, or text depending on module.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [380, -280],
      "id": "sticky-ve-section-003",
      "name": "Sticky Note2"
    },
    {
      "parameters": {
        "content": "### Error Handling\nChecks for CapSolver errors (invalid image, unsupported module, etc.) and returns structured error response.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [860, -280],
      "id": "sticky-ve-section-004",
      "name": "Sticky Note3"
    },
    {
      "parameters": {
        "content": "### Webhook Trigger\nPOST /webhook/vision-engine-solver with JSON body containing module, image, and optional imageBackground / question.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-580, -280],
      "id": "sticky-ve-section-005",
      "name": "Sticky Note4"
    },
    {
      "parameters": {
        "content": "### Success Response\nReturns the full solution object from CapSolver โ€” contents vary by module type.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [1340, -280],
      "id": "sticky-ve-section-006",
      "name": "Sticky Note5"
    },
    {
      "parameters": {
        "content": "### Error Response\nReturns the error message from CapSolver or from input validation.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [1340, 240],
      "id": "sticky-ve-section-007",
      "name": "Sticky Note6"
    },
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "vision-engine-solver",
        "responseMode": "responseNode",
        "options": {}
      },
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 2,
      "position": [-540, 0],
      "id": "ve-api-11111111-1111-1111-1111-111111111101",
      "name": "Receive Solver Request",
      "webhookId": "ve-api-11111111-aaaa-bbbb-cccc-111111111101"
    },
    {
      "parameters": {
        "jsCode": "const body = $input.first().json.body || {};\nconst validModules = ['slider_1', 'rotate_1', 'rotate_2', 'shein', 'ocr_gif'];\n\nconst module = (body.module || '').trim();\nconst image = (body.image || '').trim();\nconst imageBackground = (body.imageBackground || '').trim();\nconst question = (body.question || '').trim();\nconst websiteURL = (body.websiteURL || '').trim();\n\n// Validate required fields\nif (!image) {\n  return [{ json: { error: 'Missing required field: image (base64 encoded)' } }];\n}\n\nif (!module) {\n  return [{ json: { error: 'Missing required field: module' } }];\n}\n\nif (!validModules.includes(module)) {\n  return [{ json: { error: `Invalid module: ${module}. Must be one of: ${validModules.join(', ')}` } }];\n}\n\n// Module-specific validation\nif ((module === 'slider_1' || module === 'rotate_2') && !imageBackground) {\n  return [{ json: { error: `Module ${module} requires imageBackground` } }];\n}\n\nif (module === 'shein' && !question) {\n  return [{ json: { error: 'Module shein requires a question parameter' } }];\n}\n\nreturn [{ json: {\n  module,\n  image,\n  imageBackground,\n  question,\n  websiteURL,\n  validated: true\n} }];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [-60, 0],
      "id": "ve-api-11111111-1111-1111-1111-111111111102",
      "name": "Validate Input"
    },
    {
      "parameters": {
        "resource": "Recognition",
        "operation": "Vision Engine",
        "module": "={{ $json.module }}",
        "image": "={{ $json.image }}",
        "imageBackground": "={{ $json.imageBackground || '' }}",
        "question": "={{ $json.question || '' }}",
        "websiteURL": "={{ $json.websiteURL || '' }}"
      },
      "type": "n8n-nodes-capsolver.capSolver",
      "typeVersion": 1,
      "position": [420, 0],
      "id": "ve-api-11111111-1111-1111-1111-111111111103",
      "name": "Solve Visual Puzzle",
      "onError": "continueRegularOutput",
      "credentials": {
        "capSolverApi": {
          "id": "YOUR_CREDENTIAL_ID",
          "name": "CapSolver account"
        }
      }
    },
    {
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": true,
            "leftValue": "",
            "typeValidation": "loose",
            "version": 2
          },
          "conditions": [
            {
              "id": "ve-err-001",
              "leftValue": "={{ $json.error }}",
              "operator": {
                "type": "string",
                "operation": "isNotEmpty",
                "singleValue": true
              }
            }
          ],
          "combinator": "and"
        },
        "options": {}
      },
      "type": "n8n-nodes-base.if",
      "typeVersion": 2.2,
      "position": [900, 0],
      "id": "ve-api-11111111-1111-1111-1111-111111111104",
      "name": "Vision Engine Error?"
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify($json.data) }}",
        "options": {}
      },
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.1,
      "position": [1380, -80],
      "id": "ve-api-11111111-1111-1111-1111-111111111105",
      "name": "Respond to Webhook"
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify({ error: $json.error }) }}",
        "options": {
          "responseCode": 400
        }
      },
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.1,
      "position": [1380, 120],
      "id": "ve-api-11111111-1111-1111-1111-111111111106",
      "name": "Respond to Webhook Error"
    }
  ],
  "connections": {
    "Receive Solver Request": {
      "main": [
        [
          {
            "node": "Validate Input",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Validate Input": {
      "main": [
        [
          {
            "node": "Solve Visual Puzzle",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Solve Visual Puzzle": {
      "main": [
        [
          {
            "node": "Vision Engine Error?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Vision Engine Error?": {
      "main": [
        [
          {
            "node": "Respond to Webhook Error",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Respond to Webhook",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  }
}

Workflow 2: Slider Puzzle Solver โ€” Fetch & Solve

This workflow demonstrates a practical end-to-end slider puzzle solver. It fetches the puzzle piece and background images from URLs, converts them to base64, sends them to Vision Engine with the slider_1 module, and returns the pixel distance needed to complete the slider.

This is the pattern you would use when integrating slider CAPTCHA solving into a larger automation โ€” the returned distance tells your browser automation (Puppeteer, Playwright, Selenium) exactly how far to drag the slider handle.

Node Flow

Copy
Schedule Trigger (every 1 hour) โ”€โ”
                                 โ”œโ†’ Set Puzzle Config โ†’ Fetch Puzzle Image โ†’ Fetch Background Image
Webhook Trigger (POST) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ†’ Convert Images to Base64 โ†’ Solve Slider Puzzle
                                        โ†’ Slider Error? โ†’ Format Solution โ†’ Return Result
                                                        โ†’ Format Error โ†’ Return Error

How It Works

1. Dual Triggers

  • Schedule Trigger: runs every hour for automated testing or recurring puzzle-solving
  • Webhook Trigger: on-demand activation from another workflow or external service

2. Set Puzzle Config

Defines the URLs for the puzzle piece image and background image, plus any optional websiteURL for improved accuracy. In a real integration, these URLs would come from the target site's CAPTCHA challenge response.

3. Fetch Puzzle Image + Fetch Background Image

Two HTTP Request nodes download the puzzle piece and background images as binary data.

4. Convert Images to Base64

A Code node converts both binary images to raw base64 strings, stripping any data:image/...;base64, prefix.

5. Solve Slider Puzzle

The CapSolver node with:

  • Resource: Recognition
  • Operation: Vision Engine
  • module: slider_1
  • image: base64 puzzle piece
  • imageBackground: base64 background

Returns the distance in pixels instantly.

6. Check Result and Respond

The IF node checks for errors. On success, the solution is formatted and returned. On error, the error message is returned.

Expected Response

Success:

json Copy
{
  "success": true,
  "module": "slider_1",
  "distance": 142,
  "unit": "pixels",
  "solvedAt": "2026-03-16T10:00:00.000Z"
}

Error:

json Copy
{
  "success": false,
  "error": "ERROR_INVALID_IMAGE",
  "solvedAt": "2026-03-16T10:00:00.000Z"
}

Import This Workflow

Click to expand workflow JSON
json Copy
{
  "name": "Slider Puzzle Solver โ€” Fetch & Solve โ€” Vision Engine",
  "nodes": [
    {
      "parameters": {
        "content": "## Slider Puzzle Solver โ€” Fetch & Solve\n\n**Who it's for:** Automation teams solving slider CAPTCHAs as part of browser automation or scraping pipelines.\n\n**What it does:** Fetches a slider puzzle image and its background from URLs, converts both to base64, sends them to CapSolver Vision Engine (slider_1 module), and returns the exact pixel distance to drag the slider.\n\n**How it works:**\n1. Schedule (every 1h) or Webhook triggers the flow\n2. Config node sets the puzzle image URLs\n3. Two HTTP Request nodes fetch the images\n4. Code node converts images to base64\n5. CapSolver Vision Engine solves the slider puzzle\n6. Returns distance in pixels for automation\n\n**Setup:**\n1. Add your CapSolver API key under **Settings โ†’ Credentials**\n2. Replace placeholder image URLs in Set Puzzle Config\n3. Activate and test",
        "height": 560,
        "width": 460,
        "color": 1
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-1200, -380],
      "id": "sticky-slider-main-001",
      "name": "Sticky Note"
    },
    {
      "parameters": {
        "content": "### Triggers\nSchedule (hourly) or Webhook โ€” both feed into the same puzzle-solving pipeline.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-860, -280],
      "id": "sticky-slider-section-002",
      "name": "Sticky Note1"
    },
    {
      "parameters": {
        "content": "### Puzzle Configuration\nSet the URLs for the puzzle piece and background images. In production, extract these from the target site's challenge response.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [-380, -280],
      "id": "sticky-slider-section-003",
      "name": "Sticky Note2"
    },
    {
      "parameters": {
        "content": "### Image Fetching\nDownload both images as binary. The puzzle piece goes into image, the background (with slot) goes into imageBackground.",
        "height": 480,
        "width": 920,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [100, -280],
      "id": "sticky-slider-section-004",
      "name": "Sticky Note3"
    },
    {
      "parameters": {
        "content": "### Base64 Conversion\nConverts binary image data to raw base64 strings (no data URI prefix). Both images must be raw base64 for the CapSolver API.",
        "height": 480,
        "width": 440,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [1060, -280],
      "id": "sticky-slider-section-005",
      "name": "Sticky Note4"
    },
    {
      "parameters": {
        "content": "### Vision Engine Solve + Result Handling\nCapSolver returns the slider distance instantly. The result is formatted and returned via webhook or stored for downstream use.",
        "height": 480,
        "width": 1400,
        "color": 6
      },
      "type": "n8n-nodes-base.stickyNote",
      "typeVersion": 1,
      "position": [1540, -280],
      "id": "sticky-slider-section-006",
      "name": "Sticky Note5"
    },
    {
      "parameters": {
        "rule": {
          "interval": [
            {
              "field": "hours",
              "hoursInterval": 1
            }
          ]
        }
      },
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1.3,
      "position": [-820, -60],
      "id": "ve-slider-22222222-2222-2222-2222-222222222201",
      "name": "Every 1 Hour"
    },
    {
      "parameters": {
        "httpMethod": "POST",
        "path": "slider-puzzle-solver",
        "responseMode": "responseNode",
        "options": {}
      },
      "type": "n8n-nodes-base.webhook",
      "typeVersion": 2.1,
      "position": [-820, 140],
      "id": "ve-slider-22222222-2222-2222-2222-222222222202",
      "name": "Webhook Trigger",
      "webhookId": "ve-slider-22222222-aaaa-bbbb-cccc-222222222202",
      "onError": "continueRegularOutput"
    },
    {
      "parameters": {
        "assignments": {
          "assignments": [
            {
              "id": "cfg-001",
              "name": "puzzleImageURL",
              "value": "={{ $json.body?.puzzleImageURL || 'https://example.com/captcha/puzzle-piece.png' }}",
              "type": "string"
            },
            {
              "id": "cfg-002",
              "name": "backgroundImageURL",
              "value": "={{ $json.body?.backgroundImageURL || 'https://example.com/captcha/background.png' }}",
              "type": "string"
            },
            {
              "id": "cfg-003",
              "name": "websiteURL",
              "value": "={{ $json.body?.websiteURL || '' }}",
              "type": "string"
            }
          ]
        },
        "options": {}
      },
      "type": "n8n-nodes-base.set",
      "typeVersion": 3.4,
      "position": [-340, 0],
      "id": "ve-slider-22222222-2222-2222-2222-222222222203",
      "name": "Set Puzzle Config"
    },
    {
      "parameters": {
        "url": "={{ $json.puzzleImageURL }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "file"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [140, -60],
      "id": "ve-slider-22222222-2222-2222-2222-222222222204",
      "name": "Fetch Puzzle Image"
    },
    {
      "parameters": {
        "url": "={{ $('Set Puzzle Config').first().json.backgroundImageURL }}",
        "options": {
          "response": {
            "response": {
              "responseFormat": "file"
            }
          }
        }
      },
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4.3,
      "position": [540, -60],
      "id": "ve-slider-22222222-2222-2222-2222-222222222205",
      "name": "Fetch Background Image"
    },
    {
      "parameters": {
        "jsCode": "// Get binary data from both image fetches\nconst puzzleBinary = $input.first().binary;\nconst config = $('Set Puzzle Config').first().json;\n\nif (!puzzleBinary || !puzzleBinary.data) {\n  return [{ json: { error: 'Failed to fetch puzzle image โ€” no binary data returned' } }];\n}\n\n// Convert puzzle piece to base64\nconst puzzleBuffer = await this.helpers.getBinaryDataBuffer(0, 'data');\nconst puzzleBase64 = puzzleBuffer.toString('base64');\n\n// Get background image binary from the current node input\n// The background image was fetched in the previous node\nlet backgroundBase64 = '';\ntry {\n  const bgBinary = $input.first().binary;\n  if (bgBinary && bgBinary.data) {\n    const bgBuffer = await this.helpers.getBinaryDataBuffer(0, 'data');\n    backgroundBase64 = bgBuffer.toString('base64');\n  }\n} catch (e) {\n  // Background may not be available if fetch failed\n}\n\nreturn [{ json: {\n  image: puzzleBase64,\n  imageBackground: backgroundBase64,\n  websiteURL: config.websiteURL || '',\n  module: 'slider_1'\n} }];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [1100, 0],
      "id": "ve-slider-22222222-2222-2222-2222-222222222206",
      "name": "Convert Images to Base64"
    },
    {
      "parameters": {
        "resource": "Recognition",
        "operation": "Vision Engine",
        "module": "={{ $json.module }}",
        "image": "={{ $json.image }}",
        "imageBackground": "={{ $json.imageBackground }}",
        "websiteURL": "={{ $json.websiteURL || '' }}"
      },
      "type": "n8n-nodes-capsolver.capSolver",
      "typeVersion": 1,
      "position": [1580, 0],
      "id": "ve-slider-22222222-2222-2222-2222-222222222207",
      "name": "Solve Slider Puzzle",
      "onError": "continueRegularOutput",
      "credentials": {
        "capSolverApi": {
          "id": "YOUR_CREDENTIAL_ID",
          "name": "CapSolver account"
        }
      }
    },
    {
      "parameters": {
        "conditions": {
          "options": {
            "caseSensitive": true,
            "leftValue": "",
            "typeValidation": "loose",
            "version": 2
          },
          "conditions": [
            {
              "id": "slider-err-001",
              "leftValue": "={{ $json.error }}",
              "operator": {
                "type": "string",
                "operation": "isNotEmpty",
                "singleValue": true
              }
            }
          ],
          "combinator": "and"
        },
        "options": {}
      },
      "type": "n8n-nodes-base.if",
      "typeVersion": 2.2,
      "position": [1900, 0],
      "id": "ve-slider-22222222-2222-2222-2222-222222222208",
      "name": "Slider Error?"
    },
    {
      "parameters": {
        "jsCode": "const solution = $input.first().json.data?.solution || $input.first().json.data || {};\nconst distance = solution.distance || solution.slide_distance || null;\n\nreturn [{ json: {\n  success: true,\n  module: 'slider_1',\n  distance: distance,\n  unit: 'pixels',\n  rawSolution: solution,\n  solvedAt: new Date().toISOString()\n} }];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [2200, -80],
      "id": "ve-slider-22222222-2222-2222-2222-222222222209",
      "name": "Format Solution"
    },
    {
      "parameters": {
        "jsCode": "return [{ json: {\n  success: false,\n  error: $input.first().json.error || 'Unknown Vision Engine error',\n  solvedAt: new Date().toISOString()\n} }];"
      },
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [2200, 120],
      "id": "ve-slider-22222222-2222-2222-2222-222222222210",
      "name": "Format Error"
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify($json) }}",
        "options": {}
      },
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.5,
      "position": [2500, -80],
      "id": "ve-slider-22222222-2222-2222-2222-222222222211",
      "name": "Return Result"
    },
    {
      "parameters": {
        "respondWith": "json",
        "responseBody": "={{ JSON.stringify($json) }}",
        "options": {
          "responseCode": 400
        }
      },
      "type": "n8n-nodes-base.respondToWebhook",
      "typeVersion": 1.5,
      "position": [2500, 120],
      "id": "ve-slider-22222222-2222-2222-2222-222222222212",
      "name": "Return Error"
    }
  ],
  "connections": {
    "Every 1 Hour": {
      "main": [
        [
          {
            "node": "Set Puzzle Config",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Webhook Trigger": {
      "main": [
        [
          {
            "node": "Set Puzzle Config",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Set Puzzle Config": {
      "main": [
        [
          {
            "node": "Fetch Puzzle Image",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Puzzle Image": {
      "main": [
        [
          {
            "node": "Fetch Background Image",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Fetch Background Image": {
      "main": [
        [
          {
            "node": "Convert Images to Base64",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Convert Images to Base64": {
      "main": [
        [
          {
            "node": "Solve Slider Puzzle",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Solve Slider Puzzle": {
      "main": [
        [
          {
            "node": "Slider Error?",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Slider Error?": {
      "main": [
        [
          {
            "node": "Format Error",
            "type": "main",
            "index": 0
          }
        ],
        [
          {
            "node": "Format Solution",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Format Solution": {
      "main": [
        [
          {
            "node": "Return Result",
            "type": "main",
            "index": 0
          }
        ]
      ]
    },
    "Format Error": {
      "main": [
        [
          {
            "node": "Return Error",
            "type": "main",
            "index": 0
          }
        ]
      ]
    }
  },
  "active": false,
  "settings": {
    "executionOrder": "v1"
  }
}

Test It

Testing the Solver API

Once you have configured the CapSolver credential and activated the workflow, test the Solver API:

Slider puzzle:

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "slider_1",
    "image": "BASE64_PUZZLE_PIECE_HERE",
    "imageBackground": "BASE64_BACKGROUND_HERE"
  }'

Rotation puzzle:

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "rotate_1",
    "image": "BASE64_IMAGE_TO_ROTATE"
  }'

Object selection (shein):

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "shein",
    "image": "BASE64_IMAGE",
    "question": "Select all shoes"
  }'

GIF OCR:

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/vision-engine-solver \
  -H "Content-Type: application/json" \
  -d '{
    "module": "ocr_gif",
    "image": "BASE64_GIF_DATA"
  }'

Testing the Slider Puzzle Solver

bash Copy
curl -X POST https://your-n8n-instance.com/webhook/slider-puzzle-solver \
  -H "Content-Type: application/json" \
  -d '{
    "puzzleImageURL": "https://example.com/captcha/puzzle-piece.png",
    "backgroundImageURL": "https://example.com/captcha/background.png",
    "websiteURL": "https://example.com"
  }'

A response with a numeric distance value confirms the full pipeline worked โ€” images fetched, converted to base64, Vision Engine solved the slider, and the pixel distance was returned.


Understanding the Response

Vision Engine returns different solution shapes depending on the module:

slider_1

json Copy
{
  "solution": {
    "distance": 142
  }
}

The distance is in pixels โ€” this is how far the slider handle must be dragged to the right to complete the puzzle.

rotate_1 / rotate_2

json Copy
{
  "solution": {
    "angle": 73
  }
}

The angle is in degrees โ€” this is how much the image must be rotated (clockwise) to reach the correct orientation.

shein

json Copy
{
  "solution": {
    "rects": [
      { "x1": 45, "y1": 120, "x2": 180, "y2": 250 },
      { "x1": 300, "y1": 90, "x2": 420, "y2": 210 }
    ]
  }
}

Each rect in the array is a bounding box (top-left and bottom-right coordinates) for a matching area in the image.

ocr_gif

json Copy
{
  "solution": {
    "text": "x7Km9"
  }
}

The text is the recognized string from the animated GIF.


Adapting for Other Module Types

The Solver API workflow already supports all five modules via the request body. To build a dedicated workflow for a different module โ€” say, rotate_1 for rotation puzzles โ€” the changes are minimal:

  1. Config node: Replace puzzleImageURL / backgroundImageURL with just the rotation image URL
  2. Fetch nodes: Only one HTTP Request needed (no background image for rotate_1)
  3. CapSolver node: Change module to rotate_1
  4. Format Solution: Extract angle instead of distance

For shein, you would also add the question parameter to the config and pass it through to the CapSolver node.


Troubleshooting

"ERROR_INVALID_IMAGE"

The base64 string is malformed or empty. Check that:

  • The image was fetched successfully (HTTP 200)
  • The binary-to-base64 conversion produced a non-empty string
  • The data:image/...;base64, prefix was stripped
  • The base64 string has no newlines or spaces

"ERROR_INVALID_MODULE"

The module value does not match any supported model. Use exactly one of: slider_1, rotate_1, rotate_2, shein, ocr_gif.

Distance Returns 0 or Null

The images may not be a valid slider puzzle pair. Check that:

  • image is the puzzle piece (the small draggable fragment)
  • imageBackground is the full background with the missing slot visible
  • Both images are from the same challenge instance
  • The images are not corrupted or too small

CapSolver Node Shows No "Vision Engine" Option

Make sure you have n8n-nodes-capsolver version 1.x or later installed. The Vision Engine operation was added in recent versions. Update the community node if needed:

  1. Go to Settings > Community Nodes
  2. Find n8n-nodes-capsolver
  3. Update to the latest version
  4. Restart n8n

Webhook Returns 404

The workflow must be active for the webhook to be live. Import the workflow, configure credentials, then toggle the workflow to active in n8n.


Best Practices

  1. Use raw base64 strings โ€” always strip the data:image/...;base64, prefix before passing images to the CapSolver node.

  2. Match images to modules โ€” slider_1 needs both image and imageBackground. rotate_1 needs only image. shein needs image plus question. Using the wrong combination will fail or return incorrect results.

  3. Fetch images fresh โ€” visual puzzle images are typically single-use and expire quickly. Fetch them as close to solve time as possible.

  4. Vision Engine is instant โ€” unlike Token operations that poll for results, Recognition operations return immediately. Your workflow does not need retry logic or polling delays.

  5. No proxy needed โ€” Vision Engine analyzes images server-side. There is no browser interaction with a target site, so no proxy is required.

  6. Validate before solving โ€” check that the image data is present and the module name is valid before calling the CapSolver node. This avoids wasting API credits on requests that will fail.

  7. Use websiteURL when available โ€” while optional, providing the source page URL can improve accuracy for some puzzle types.

  8. Handle module-specific responses โ€” different modules return different fields (distance, angle, rects, text). Your downstream logic should check which module was used and extract the correct field.

Ready to get started? Sign up for CapSolver and use bonus code n8n for an extra 8% bonus on your first recharge!

CapSolver bonus code banner

Conclusion

Vision Engine fills a different gap than CapSolver's Token operations. Where reCAPTCHA, Turnstile, and Cloudflare Challenge solving returns tokens that you submit to bypass a gate, Vision Engine returns measurements that your automation uses to interact with visual puzzles โ€” dragging sliders, rotating images, selecting objects, or reading animated text.

The key differences to remember:

  • Recognition resource, not Token โ€” instant results, no polling
  • Base64 images in, measurements out โ€” pixels, degrees, coordinates, or text
  • No proxy needed โ€” the AI analyzes images server-side
  • Five modules โ€” each designed for a specific visual puzzle type

The two workflows in this article cover the two most common integration patterns:

  1. Solver API โ€” a generic webhook endpoint that accepts any module and returns the solution
  2. Slider Puzzle Solver โ€” a complete fetch-convert-solve pipeline for slider CAPTCHAs

Both import as inactive. Configure your CapSolver credential, replace the placeholder values, activate the workflow, and test.


Frequently Asked Questions

How is Vision Engine different from Image To Text (OCR)?

Image To Text recognizes characters in a static image โ€” standard OCR. Vision Engine goes further: it understands spatial relationships in visual puzzles. It can calculate slider distances, rotation angles, object bounding boxes, and even read text from animated GIFs. They are both Recognition operations (instant result, no polling), but they solve different types of problems.

Do I need a proxy for Vision Engine?

No. Vision Engine analyzes the images you provide server-side. There is no browser session, no cookie, and no interaction with a target website. Proxies are not needed and the CapSolver node does not accept a proxy parameter for Vision Engine tasks.

Can I solve multiple puzzles in one workflow execution?

Yes. The CapSolver node processes one item at a time, but n8n's item-based execution means you can pass multiple items through the node. Each item gets its own Recognition call and returns its own solution. Use a Split In Batches node or feed multiple items from a Code node.

What image formats are supported?

The image and imageBackground fields accept base64-encoded JPEG, PNG, GIF, and WebP. The base64 string must be raw โ€” no data:image/...;base64, prefix, no newlines.

How do I get the puzzle images from a real site?

In a real slider CAPTCHA integration, the target site serves the puzzle images as part of the challenge response. Typically you would:

  1. Load the page (via HTTP Request or browser automation)
  2. Extract the image URLs from the CAPTCHA widget's DOM or network requests
  3. Fetch the images
  4. Convert to base64
  5. Send to Vision Engine

The Slider Puzzle Solver workflow demonstrates steps 3-5. Steps 1-2 depend on the specific target site.

What does the question parameter do?

The question parameter is only used by the shein module. It tells the AI what to look for in the image โ€” for example, "Select all shoes" or "Tap the matching items." For all other modules, leave it empty.

Can I use Vision Engine for hCaptcha image challenges?

Vision Engine's modules (slider_1, rotate_1, rotate_2, shein, ocr_gif) are designed for specific visual puzzle types. hCaptcha image classification challenges use a different approach. Check the CapSolver documentation for hCaptcha-specific solutions.

How fast is Vision Engine?

Vision Engine is a Recognition operation, which means the result comes back in a single API call โ€” typically under 2 seconds. There is no polling loop, no getTaskResult calls, and no timeout waiting. This makes it significantly faster than Token operations, which can take 10-30 seconds to complete.

What happens if the image is too small or too large?

Very small images may not contain enough detail for accurate analysis. Very large images will increase the base64 payload size and may slow down the request. For best results, use the original resolution provided by the CAPTCHA challenge โ€” do not resize the images.

Can I chain Vision Engine with browser automation?

Yes, and that is the intended use case for most real-world applications. The typical flow is:

  1. Browser automation (Puppeteer/Playwright via n8n) loads the page
  2. The CAPTCHA challenge appears with puzzle images
  3. Your workflow extracts the image URLs and fetches them
  4. Vision Engine returns the solution (distance, angle, etc.)
  5. Browser automation uses the solution to complete the challenge (drag slider, rotate image, click coordinates)

The Vision Engine workflow handles step 3-4. Steps 1-2 and 5 are handled by your browser automation nodes.

Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.

More