How to Use Playwright in Ruby for Web Scraping

Ethan Collins
Pattern Recognition Specialist
08-Oct-2024
Web scraping has become an essential skill for gathering data from websites, whether for market analysis, academic research, or any data-driven project. Playwright is an excellent browser automation tool that can be used to scrape websites efficiently, offering support for multiple languages, including Ruby. In this guide, we'll walk through how to set up and use Playwright in Ruby to scrape a website, using quotes.toscrape.com as an example.
What is Playwright?
Playwright is a modern automation framework for web testing, similar to Selenium but with faster execution and support for all modern browsers like Chromium, Firefox, and WebKit. It offers powerful browser automation tools for headless and headed scraping, page navigation, interacting with forms, and more.
Why Use Playwright with Ruby?
Ruby is a popular language known for its simplicity and developer-friendly syntax. By using Playwright with Ruby, you can leverage the power of modern browser automation while maintaining Ruby’s clean and easy-to-read code structure. Playwright is ideal for web scraping due to its speed, built-in wait-for conditions, and the ability to deal with dynamic content loaded by JavaScript.
Setting Up Playwright in Ruby
To start scraping with Playwright in Ruby, you'll need to set up a few things:
1. Install Ruby
Ensure you have Ruby installed on your machine. You can check this by running the following command in your terminal:
bash
ruby -v
If Ruby is not installed, you can install it via rbenv or directly from Ruby’s official site.
2. Install the Playwright Gem
Next, you’ll need to install the playwright-ruby-client
gem. This gem provides Playwright bindings for Ruby, allowing you to interact with browsers programmatically.
Run the following command to install the gem:
bash
gem install playwright-ruby-client
3. Install Browsers
After installing the gem, you need to install the browsers supported by Playwright. Run the following command:
bash
playwright install
This will download Chromium, Firefox, and WebKit for use with Playwright.
Scraping Example: Scraping Quotes from a Website
Let’s dive into a simple scraping example where we’ll extract quotes from quotes.toscrape.com. The website contains famous quotes along with the authors, making it a great resource for scraping practice.
Step 1: Initialize Playwright and Launch a Browser
First, you need to initialize Playwright and launch a browser (Chromium in this case). Here's how to do that:
ruby
require 'playwright-ruby-client'
Playwright.create(playwright_cli_executable_path: '/path/to/cli') do |playwright|
browser = playwright.chromium.launch(headless: true) # Launch headless browser
page = browser.new_page
page.goto('http://quotes.toscrape.com/')
puts "Page title: #{page.title}" # Optional: Print page title to verify it's loaded correctly
# Close the browser
browser.close
end
In this snippet, Playwright opens the quotes.toscrape.com page in a headless Chromium browser.
Step 2: Scrape Quotes and Authors
Now, we want to scrape the quotes and their authors from the page. To do this, we need to inspect the page structure and identify the elements containing the quotes and authors.
Here’s the code that extracts the quotes and their respective authors:
ruby
require 'playwright-ruby-client'
Playwright.create(playwright_cli_executable_path: '/path/to/cli') do |playwright|
browser = playwright.chromium.launch(headless: true)
page = browser.new_page
page.goto('http://quotes.toscrape.com/')
# Find all quote elements
quotes = page.query_selector_all('.quote')
quotes.each do |quote|
text = quote.query_selector('.text').text_content.strip
author = quote.query_selector('.author').text_content.strip
puts "Quote: #{text} - Author: #{author}"
end
browser.close
end
This script uses Playwright to visit the website, extract the quote text and author, and then print them to the console. The .quote
class targets each quote block, and we use .text
and .author
to extract the relevant information.
Step 3: Handle Pagination
The quotes website uses pagination, so you may want to scrape all pages, not just the first one. Here's how to handle pagination:
ruby
require 'playwright-ruby-client'
Playwright.create(playwright_cli_executable_path: '/path/to/cli') do |playwright|
browser = playwright.chromium.launch(headless: true)
page = browser.new_page
page.goto('http://quotes.toscrape.com/')
loop do
quotes = page.query_selector_all('.quote')
quotes.each do |quote|
text = quote.query_selector('.text').text_content.strip
author = quote.query_selector('.author').text_content.strip
puts "Quote: #{text} - Author: #{author}"
end
next_button = page.query_selector('li.next > a')
break unless next_button # Exit loop if no next page
next_button.click
page.wait_for_load_state('load') # Wait for the next page to load
end
browser.close
end
This code loops through each page by clicking the "Next" button until there are no more pages. It continues to extract the quotes and authors from every page.
Step-by-Step Guide: Solving captcha Using Playwright and CapSolver in Ruby
This guide explains how to solve reCaptcha using the CapSolver browser extension with Playwright in Ruby. CapSolver provides an easy way to handle captchas without writing extra code to directly solve them.
Step 1: Install Playwright and Dependencies
First, ensure you have Playwright installed:
bash
gem install playwright-ruby-client
Step 2: Download and Configure the CapSolver Extension
-
Download the CapSolver extension:
- Download the CapSolver extension from the CapSolver GitHub releases page.
- Unzip the extension into a directory at the root of your project, such as
./CapSolver.Browser.Extension
.
-
Configure the Extension:
- Locate the configuration file
./assets/config.json
in the CapSolver extension directory. - Set the option
enabledForcaptcha
totrue
and adjust thecaptchaMode
totoken
for automatic solving.
Example
config.json
:json{ "enabledForcaptcha": true, "captchaMode": "token" // other settings remain the same }
- Locate the configuration file
Step 3: Setup Playwright with the CapSolver Extension
Here’s how you can load the CapSolver extension into the Playwright browser:
-
Require Playwright and Set Up Paths:
rubyrequire 'playwright-ruby-client' require 'fileutils' # Get the path for the CapSolver extension directory extension_path = File.join(Dir.pwd, 'CapSolver.Browser.Extension')
-
Launch the Browser with the CapSolver Extension:
Use Playwright to launch a Chromium browser with the CapSolver extension loaded.rubyPlaywright.create(playwright_cli_executable_path: '/path/to/cli') do |playwright| browser = playwright.chromium.launch_persistent_context('', { headless: false, # Run with a visible browser for debugging args: [ "--disable-extensions-except=#{extension_path}", "--load-extension=#{extension_path}" ] }) page = browser.new_page page.goto('https://quotes.toscrape.com/') # Replace with the target URL # Locate the captcha checkbox or frame and interact with it page.wait_for_selector('iframe', state: 'visible') # Adjust the selector to target captcha iframe page.click('iframe') # Adjust the click event for your captcha's interaction # Additional steps can be added based on the site’s requirements browser.close end
The steps for solve reCaptcha are the same as captcha.
Bonus Code
Claim your Bonus Code for top captcha solutions at CapSolver: scrape. After redeeming it, you will get an extra 5% bonus after each recharge, unlimited times.

Conclusion
Using Playwright in Ruby for web scraping offers an efficient and powerful way to extract data from websites. Whether it's simple static content or dynamically loaded pages, Playwright handles both effortlessly. In this tutorial, we scraped quotes and authors from a website, but Playwright can do much more—like interacting with forms, taking screenshots, or even running browser-based tests.
If you're looking for a robust tool for web scraping in Ruby, Playwright is an excellent choice. It's easy to set up, fast, and flexible enough to handle various scraping tasks.
Compliance Disclaimer: The information provided on this blog is for informational purposes only. CapSolver is committed to compliance with all applicable laws and regulations. The use of the CapSolver network for illegal, fraudulent, or abusive activities is strictly prohibited and will be investigated. Our captcha-solving solutions enhance user experience while ensuring 100% compliance in helping solve captcha difficulties during public data crawling. We encourage responsible use of our services. For more information, please visit our Terms of Service and Privacy Policy.
More

AI-powered Image Recognition: The Basics and How to Solve it
Say goodbye to image CAPTCHA struggles – CapSolver Vision Engine solves them fast, smart, and hassle-free!

Lucas Mitchell
24-Apr-2025

Best User Agents for Web Scraping & How to Use Them
A guide to the best user agents for web scraping and their effective use to avoid detection. Explore the importance of user agents, types, and how to implement them for seamless and undetectable web scraping.

Ethan Collins
07-Mar-2025

What is a Captcha? Can Captcha Track You?
Ever wondered what a CAPTCHA is and why websites make you solve them? Learn how CAPTCHAs work, whether they track you, and why they’re crucial for web security. Plus, discover how to bypass CAPTCHAs effortlessly with CapSolver for web scraping and automation.

Lucas Mitchell
05-Mar-2025

Cloudflare TLS Fingerprinting: What It Is and How to Solve It
Learn about Cloudflare's use of TLS fingerprinting for security, how it detects and blocks bots, and explore effective methods to solve it for web scraping and automated browsing tasks.

Lucas Mitchell
28-Feb-2025

Why do I keep getting asked to verify I'm not a robot?
Learn why Google prompts you to verify you're not a robot and explore solutions like using CapSolver’s API to solve CAPTCHA challenges efficiently.

Ethan Collins
27-Feb-2025

What is the best CAPTCHA solver in 2025
Discover the best CAPTCHA solver in 2025 with CapSolver, the ultimate tool for automated web scraping, CAPTCHA bypass, and data collection using advanced AI and machine learning. Enjoy bonus codes, seamless integration, and real-world examples to boost your scraping efficiency.

AloĂsio VĂtor
25-Feb-2025