Blog
How to Solve DataDome 403 Forbidden Error in Web Scraping | Complete Solution

How to Solve DataDome 403 Forbidden Error in Web Scraping | Complete Solution

Logo of Capsolver

CapSolver Blogger

How to use capsolver

05-Jun-2024

Encountering the DataDome 403 Forbidden error is a common challenge for web scrapers. DataDome, a robust bot protection service, uses advanced techniques to identify and block automated traffic. This guide will provide a comprehensive solution to solve the DataDome 403 Forbidden error, ensuring uninterrupted access to the desired content.

Table of Content

  • What Is Datadome and the 403 Forbidden error
  • What Causes DataDome 403 Forbidden Error in Web Scraping?
  • How to Solve DataDome 403 Forbidden Error
    • Using Capsolver Solve 403 Forbidden Error
    • Use High-Quality Proxies
    • Modify Device Fingerprints
    • Monitor and Adjust Your Setup
  • Conclusion

What Is Datadome and the 403 Forbidden error

DataDome is a leading bot protection and cybersecurity solution designed to detect and mitigate automated threats in real-time

DataDome's key features include:

  • AI-Powered Detection: Uses advanced machine learning models to differentiate between human and bot traffic.
  • Real-Time Protection: Continuously monitors and protects web applications, mobile apps, and APIs.
  • Comprehensive Coverage: Secures various digital touchpoints including websites, mobile apps, and APIs from automated threats.
  • Adaptive Security: Continuously evolves to address new and emerging threats.

Understanding the 403 Forbidden Error

The 403 Forbidden error is an HTTP status code that indicates that the server understands the request but refuses to authorize it. When DataDome is involved, this error typically signifies that the request has been identified as potentially malicious or coming from an automated source, and access to the requested resource is blocked.

Struggling with the repeated failure to completely solve the irritating captcha?

Discover seamless automatic captcha solving with Capsolver AI-powered Auto Web Unblock technology!

Claim Your Bonus Code for top captcha solutions; CapSolver: WEBS. After redeeming it, you will get an extra 5% bonus after each recharge, Unlimited

What Causes DataDome 403 Forbidden Error in Web Scraping?

To solve the problem, we must first understand what is causing it. Typically DataDome 403 Forbidden errors occur when DataDome's security measures detect and block web scraping activity. This can happen for a variety of reasons:

  1. IP Blocking
    When you frequently scrape web content, your IP address may be detected and blocked. Using low-quality proxy IPs can also result in your access requests being denied with a 403 status code. To mitigate this, consider using high-quality, rotating proxies to distribute your requests and avoid detection.

  2. Device Environment Detection
    DataDome monitors the environment of the device you are using, including details such as GPU information and Canvas fingerprints. If your device environment appears abnormal, this can lead to your access requests being denied with a 403 status code. Furthermore, if your device performs any suspicious activities, DataDome may permanently ban your device. To avoid detection, ensure that your device environment mimics a typical user setup and avoid any unusual configurations or behaviors.

  3. Headless Browser Detection
    DataDome inspects your browser environment to identify headless browsers. If you are using Puppeteer or Selenium to operate a headless browser, DataDome will recognize this as an abnormal environment. To bypass this detection, you can attempt to make your headless browser appear as a regular browser by modifying browser settings and behaviors, such as enabling JavaScript, setting user-agent strings, and managing cookies and local storage in a manner consistent with normal user activity.

How to Solve DataDome 403 Forbidden Error

Hitting a DataDome 403 Forbidden error could prevent you from accessing the data you need, and this can turn web scraping into a very frustrating endeavour. So after understanding why you're experiencing these issues, we'll outline a few effective ways to resolve the issue and start with the most efficient and easy method.

1. Using Capsolver Solve 403 Forbidden Error

The commonly recommended way to solve CAPTCHA is to use a third-party solution, which can be implanted into your daily workflow using an api to make the 403 as invisible as it exists. One of the best is CapSolver, which is a machine learning based CAPTCHA recognition solution that can easily solve DataDome and the annoying 403 problem within 3-5 seconds

To solve CAPTCHA using CapSolver, the Python sample code is as follows:

# -*- coding: utf-8 -*-
import requests
import time


api_key = "API_KEY"     # TODO: your api key
proxy = "host:port:name:pass"   # TODO: your proxy
page_url = "https://www.targetsite.com/"   # TODO: target site
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"


def get_token(captcha_url):
    print("call capsolver...")
    data = {
        "clientKey": api_key,
        "task": {
            "type": 'DatadomeSliderTask',
            "websiteURL": page_url,
            "captchaUrl": captcha_url,
            "userAgent": user_agent,
            "proxy": proxy,
        },
    }
    uri = 'https://api.capsolver.com/createTask'
    res = requests.post(uri, json=data)
    resp = res.json()
    task_id = resp.get('taskId')
    if not task_id:
        print("create task error:", res.text)
        return

    while True:
        time.sleep(1)
        data = {"clientKey": api_key, "taskId": task_id}
        res = requests.post('https://api.capsolver.com/getTaskResult', json=data)
        resp = res.json()
        status = resp.get('status', '')
        if status == "ready":
            cookie = resp['solution']['cookie']
            cookie = cookie.split(';')[0].split('=')[1]
            print("successfully got cookie:", cookie)
            return cookie
        if status == "failed" or resp.get("errorId"):
            print("failed to get cookie:", res.text)
            return
        print('solve datadome status:', status)


def format_proxy(px: str):
    if '@' not in px:
        sp = px.split(':')
        if len(sp) == 4:
            px = f'{sp[2]}:{sp[3]}@{sp[0]}:{sp[1]}'
    return {"http": f"http://{px}", "https": f"http://{px}"}


def request_site(cookie):
    headers = {
        'content-type': 'application/json',
        'user-agent': user_agent,
        'accept': 'application/json, text/plain, */*',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': page_url,
        'accept-encoding': 'gzip, deflate, br, zstd',
        'accept-language': 'en-US,en;q=0.9',
    }
    if cookie:
        headers['cookie'] = "datadome=" + cookie

    print("request url:", page_url)
    response = requests.get(page_url, headers=headers, proxies=format_proxy(proxy))
    print("response status_code:", response.status_code)
    if response.status_code == 403:
        resp = response.json()
        print("captcha url: ", resp['url'])
        return resp['url']
    else:
        print('cookie is good!')
        return


def main():
    url = request_site("")
    if not url:
        return
    if 't=bv' in url:
        print("blocked captcha url is not supported")
        return
    cookie = get_token(url)
    if not cookie:
        return
    request_site(cookie)


if __name__ == '__main__':
    main()

2. Use High-Quality Proxies

Avoid Low-Quality Proxies: Using low-quality or free proxies can easily get detected and blocked by DataDome. Invest in high-quality, rotating proxies that can distribute your requests across multiple IP addresses to avoid detection. also by Rotating IP AddressesRegularly can prevent any single IP from being flagged for suspicious activity.

3. Modify Device Fingerprints

  • Spoof Device Information: Modify your device's fingerprint by altering GPU information, Canvas fingerprints, and other hardware details to appear more like a regular user.
  • Avoid Headless Browsers: Headless browsers are easily detected. If you must use them, configure them to appear as though they have a graphical interface. This can include setting proper user-agent strings, enabling images, and ensuring that JavaScript is fully operational.

4. Monitor and Adjust Your Setup

  • Regularly Check for Changes: DataDome frequently updates its detection mechanisms. Regularly review and adjust your setup to stay ahead of new detection methods.
  • Test in Different Environments: Conduct tests in various environments to see how DataDome reacts. This can help you identify which configurations are most effective at avoiding detection.

Conclusion

DataDome is a powerful cybersecurity solution that provides comprehensive protection against automated threats, often leading to 403 Forbidden errors when it detects suspicious activity. By understanding how DataDome operates and employing appropriate techniques to simulate legitimate user behavior, you can effectively solve these restrictions and maintain access to your target resources.

More