What is Playwright?

What is Playwright?

·

6 min read

Table of contents

No heading

No headings in the article.

We explore the features of Playwright that make it an awesome tool for web scraping and automation. Auto-awaiting, headless mode, recording scripts with Codegen, and more 🎭

What is Playwright and why use it?

We all know that technology moves fast, but even by modern standards, the rapid rise of Playwright is impressive. Microsoft released Playwright in 2020 as an open-source Node library to automate Chromium, Firefox, and WebKit with a single API. Playwright is a software library and framework that provides automated control of a web browser with a few lines of code, making it particularly useful for web testing, scraping, automating web page interaction, taking screenshots of web pages, and running automated tests for JavaScript libraries. While similar to Puppeteer, Cypress, and Selenium, there are some differences. Let’s find out what they are.

Is Playwright a headless browser?

Not exactly. Playwright can be run in headful or headless mode (without a graphical user interface). By default, Playwright runs in headless mode, which means you won’t see what is happening in the browser when you run your script, but it will run faster. When you write and debug your scripts, it’s advisable to disable headless mode so you can see what your script is doing:

const browser = await chromium.launch({ headless: false })

On the other hand, if performance is the most important thing for you, headless mode is the way to go since headless browsers are quicker than real browsers.

What about Puppeteer and Selenium?

Speak of headless browsers, and the names Puppeteer and Selenium immediately spring to mind. So, how do these compare to their younger sibling? Puppeteer supports only JavaScript and TypeScript and works with Chromium, with experimental support for Firefox. Playwright supports Chromium, Firefox, and Safari with WebKit. You can use many programming languages with Playwright and one extra language with Selenium (Ruby). But Playwright’s greatest advantage over Selenium is its auto-waiting function.

What languages does Playwright support?

Playwright works with some of the most popular programming languages, including JavaScript, Python, Java, and C#. Its support of Chromium, Firefox, and WebKit provides a wide range of cross-browser automation and web testing capabilities.

What platform does Playwright support?

Playwright is a cross-platform framework. The browser binaries for Chromium, Firefox, and WebKit work across three platforms: Windows (and WSL), macOS (10.14 or above), and Linux (though you may need to install additional packages, depending on your Linux distribution).

How do I get started with Playwright?

One thing that isn’t said enough about Playwright: its documentation is superb. There you will find out how to install Playwright to get started.

You can install the VS Code extension. After installation, open the command panel and type Install Playwright. Alternatively, you can use the command line interface (CLI) and install Playwright using the appropriate package manager for your language. For example, NPM with Node.js:

npm init playwright@latest

That will give you the browsers and files you need to begin:

playwright.config.
tspackage.json
package-lock.json
tests/ 
    example.spec.ts
tests-examples/ 
    demo-todo-app.spec.ts

The tests folder contains a basic example test to get you started, and the tests-examples folder contains a more detailed example, with tests written to test a todo app.

Alternatively, you can simply add Playwright to your existing project by calling:

npm install playwright

Why use Playwright for web automation?


1. Faster communication with the Chrome DevTools Protocol

Most automation solutions use the WebDriver protocol to communicate with Chromium browsers, but Playwright provides much faster and more straightforward communication with the Chrome DevTools protocol. But Playwright isn’t just for Chrome and Edge, and Playwright can be configured to test sites in Firefox and Safari, as well.

2. The auto-waiting function

Cross-browser and cross-language support aside, the auto-waiting function is Playwright’s greatest advantage over Puppeteer and Selenium. You don’t have to figure out when something is clickable because Playwright performs that action for you. You can emulate mouse clicks by using await page.click(), and wait for actions in the browser to finish by using convenient APIs like await page.waitForSelector() or await page.waitForFunction().

3. You can record scripts with Codegen

The Playwright documentation includes a test generator that shows you how to record your scripts with Codegen. You just need a single CLI command to kick off:

npx playwright codegen playwright.dev

This will open up an interactive browser and the Playwright inspector. Every action in the browser will be recorded in the inspector. You can then replay and adjust the generated script. In other words, Playwright generates test script code based on your interaction with the page. That means you can author tests out of the box without having to write the script manually.

4. It has great debugging capabilities

Playwright has some excellent debugging features. You can debug scripts while you run them, which is handy during local development, and you can also analyze and debug failed tests. You can open Playwright Inspector to enable debug mode with npx playwright test --debug to debug all tests or npx playwright test example --debug to debug one test. Alternatively, you can set the PWDEBUG environment variable to run your scripts in debug mode.

Why use Playwright for web scraping?

We’ve touched upon the brilliance of Playwright when it comes to web testing and automation, but its capabilities can also come in very handy when it comes to web scraping and data mining. Here’s why:

It can be very difficult to scrape some websites with regular HTML tools. Dynamic pages and browser fingerprinting are two of the biggest challenges. Playwright’s headless mode helps overcome these problems.

1. Loading dynamic pages

When it comes to pages loaded dynamically with AJAX or data rendered using JavaScript, you’ll need to render the page like a real user. HTML scrapers can’t do that. Headless browsers can. So, in such cases, you’ll need web scraping tools like Playwright Scraper or Puppeteer Scraper to load the page, execute its JavaScript and scrape the required data.

⬆️ Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.


2. Combatting browser fingerprinting

Some websites now use fingerprinting to track users and block scraping bots. A scraper that uses a headless browser can emulate the fingerprint of a real device. Without a headless browser it’s nearly impossible to pass the various anti-bot challenges that block your access to a website. This makes using Puppeteer or Playwright Scraper your best bet when getting blocked.

Also, you can go even further and develop your own web scraper with Crawlee, a Node.js library that helps you pass those challenges automatically using Puppeteer or Playwright.

Crawlee helps you build reliable scrapers fast. Quickly scrape data, store it, and avoid getting blocked with headless browsers, smart proxy rotation, and auto-generated human-like headers and fingerprints.

A brief web scraping tutorial with Playwright

If you want to find out more about Playwright and web scraping, this tutorial shows you how to build a scraper with Playwright in Node.js to extract data about GitHub topics.

⬆️ Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

Â