Playwright vs. Puppeteer: which is better?

Playwright vs. Puppeteer: which is better?

Both are very powerful Node.js libraries useful for every web automation programmer's toolkit. So why would you choose one over the other?

You're probably familiar with how the legend goes: at some point in 2020, the Puppeteer team transformed into the Playwright team, and now we're stuck here trying to pick one of these great libraries to automate actions in a browser or test a web page. Puppeteer ended up as a Chromium-centric and established Google solution; Playwright ended up as more cross-browser, and the more recent library brought to you by the Microsoft team. But is that where the differences end? Is it apples and oranges? More like crocodiles and alligators 🐊

In this blog post, we will have a quick look at what unites these JavaScript libraries (besides being open source) and what sets them apart. We will also try to help you navigate your choice or transition between the two.

Why use Playwright or Puppeteer?

Both are very powerful Node.js libraries useful for every web automation programmer's toolkit. What can you accomplish using these libraries?

  • All kinds of automation testing: UI testing, end-to-end testing, performance testing, service worker testing, testing Chrome Extensions, etc.
  • Manipulating a browser programmatically
  • Emulating devices: timezone, type of device, location
  • Web scraping and data mining
  • RPA, web automation, workflow automation such as submitting forms online or automating data entry
  • Creating an API to a website that doesn't provide a publicly available API
  • Crawling a Single-Page Application (SPA) and generating pre-rendered content
  • Taking screenshots, recording videos of your test, generating PDFs of webpages automatically

What is Puppeteer?

Puppeteer is an open-source Node.js library with a set of high-level APIs that allows you to programmatically control Chrome browser and, to an extent, Firefox, using JavaScript and headless mode. Puppeteer's strengths include:

  • Requires zero setup. Puppeteer is an NPM native project which means you only need to execute two commands in CLI to set it up. Just install Puppeteer via npm in your project, and it's ready to begin working with.
  • Chrome-centric. The fact that both Google Chrome browser and the Puppeteer library are maintained within the same team (Google) makes Puppeteer a reliable choice. The team takes into account technical discrepancies, so they make sure that installing Puppeteer also automatically downloads a compatible, working version of Chromium.‌‌‌‌ And some extra Chrome benefit for those who chose to use Puppeteer to conduct their tests: thanks to being Chromefied, it offers a handy functionality to follow the page performance and spot issues using a so-called timeline trace.
  • Community factor. Since Puppeteer launched in 2017 with 20% of contributions to the core library from the community - which is quite a high number for an open-source project - it comes as no surprise that the Puppeteer dev community remains active today. Check out their discussions on Dev.to, Stackoverflow, and the Puppeteer contribution page; there's a high chance your issue is already there as well.

Puppeteer does fall short on a few general aspects, though, mainly related to versatility:

  • Node.js for the win. Unfortunately, Puppeteer is suitable for automation in JavaScript/Node.js exclusively. No other programming languages are supported at this time.
  • Lack of cross-browser support. Puppeteer prevails in automation based on headless Chromium (and Chrome), but that's pretty much where its impact ends. Besides, after three years, Firefox support is still a WIP in its experimental phase. No other browser engines are supported at this time.

Speaking of browsers...

What is a headless browser used for?

Headless browsers don't display the browser's interface. Instead, you must write code to instruct the browser to open a web page, type in anything, click on anything, etc. It is possible, and it is not only a personal preference (looking at you, people who prefer never to touch a mouse); headless mode is a very common method applied in web automation. It allows you to perform your actions in a browser faster since it doesn't spend time rendering any visual content.

Libraries like Puppeteer and Playwright are set up to run a browser in headless mode by default. But if you need to, you can also change that by configuring them to run headful (non-headless) mode in browsers as well. All done in one command:

let browser; (async() => { if(!browser) browser = await puppeteer.launch({headless: false});

What is possible with Puppeteer?

Anything that a user can do on a webpage is possible to replicate programmatically with Puppeteer. For instance, here's how you can take a screenshot without interacting with the browser UI:

import puppeteer from 'puppeteer';
(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.setViewport({ width: 1280, height: 720 });
    const website_url = 'https://developers.apify.com/academy/puppeteer-playwright';
    await page.goto(website_url, { waitUntil: 'networkidle0' });
    await page.screenshot({
      path: 'academy_screenshot.jpg'
    });
})();

What is Playwright?

Playwright is an open-source Node.js library/browser automation framework built and maintained by Microsoft. Back in 2020, Playwright was a brand new thing, still uncharted territory for most web automation folks. Now it has 43K stars on GH, remains popular, and is still evolving. Here's what makes Playwright stand out:

  • Requires zero setup. Same as Puppeteer, your Playwright project can be set up in minutes via an npm init playwright@latest command.
  • Cross-browser support. If you're looking for a browser range to choose from, Playwright is your best bet. This library supports Chromium, Firefox, and WebKit. Besides, the latest Playwright makes sure to support the latest browser version at any moment, so no need to worry about mismatches here.
  • Swiss Army knife for languages. Languages is another sphere where Playwright plays the versatility card. TypeScript, JavaScript, Python, Java - you name it.
  • Community factor. Playwright's Ambassadors, Slack, YouTube, and Stackoverflow are here to help you with questions you might have, issues to report, and official updates. You can also follow the Playwright team on Twitter.
  • Still a WIP. While Playwright can control Chrome, Edge, and Firefox with a single API, some might note that their API is still evolving (albeit rapidly), which may prompt one to think that Playwright is a jack of all trades, master of none. That being said, the Playwright team is very responsive to solving open issues, as well as pushing new groundbreaking features. So staying tuned pays off greatly here.

Is Playwright built on Puppeteer?

No, Playwright is a separate, more recent Node.js library for web automation, launched in 2020. However, Playwright was developed by the same people that created its predecessor, Puppeteer. This explains why many functionalities are similar or seem enhanced, so it's easy to assume Playwright is just built on Puppeteer. It rings true to some extent: their similarities make it less resource intense to migrate your code from Puppeteer to Playwright if you ever need to. You can read about migration in the section further below.

🎭 What is Playwright? ➜

So is Playwright > Puppeteer? Let's zoom in on their main differences.

Quick rundown of main differences

table1.png

Which is better: Playwright or Puppeteer?

Which is better: Playwright or Puppeteer?

Which is better: Playwright or Puppeteer? ⚡️

The natural question here is: better for what exactly? Here are factors to take into account when choosing between Puppeteer and Playwright.

☑️ Priorities

Is speed your priority?

Puppeteer has almost zero performance overhead, so it's fast. But so is Playwright. So this was a trick question. You can choose either one of them if you want your solution to be fast.

Is reliability your priority?

Puppeteer used to be ahead on this one. There have been a few cases when Playwright's main advantage (being cross-browser) has backfired with a few noticeable issues with Firefox and WebKit. Back then, relying on patched versions of those browsers for some time seemed less reliable compared to Puppeteer. But those cross-browser issues are long fixed now, reinstalling Playwright as a reliable choice.

Have you ever wished Puppeteer did things differently?

Remember, the team that built Playwright already had plenty of learning experience when creating Puppeteer. Starting with Playwright as a clean slate allowed them to avoid many of the development setbacks of the Puppeteer API or improve the features without any risk. Especially the fundamental ones that could be a breaking change to the API if messed with.

For instance, they made it default for Playwright's page.click to wait for the DOM element to be visible and available. This function would have taken way more effort in Puppeteer to update than to be written from scratch for Playwright. This doesn't mean Puppeteer won't get the same improvements later on. But it does mean that Playwright may be an easier choice.

⌨️ Languages

Are you writing your code in Python, C#, or Java?

Then Playwright is your choice; Puppeteer doesn't support any languages besides JavaScript and TypeScript.

🌐 Browsers

Are you testing or web scraping? Is browser versatility important for you?

Then Playwright should be your library of choice as it supports a range of commonly used browser engines. Playwright is really good news for web scrapers: if one browser doesn't do the job for you, you can get unstuck and quickly switch to another browser with Playwright. The same goes for testers because there's no need to configure a test to fit various browsers; tests can be simply written once and run across every browser equally.

In addition, there's a common conclusion that testers arrive at: oftentimes, testing all possible scenarios in more than one browser isn't worth the effort. Most people around the world use Chrome anyway, so you may also want to consider this other side of the story. Besides, if you're up to something like workflow automation, multiple browser support might not be on your priority list at all.

✨Unique features

Are you looking to automate repetitive code?

One example of this is Playwright's automatic waits function which automatically sets up a waiting condition for selectors to be available. Not in an AI sort of way but also not in the way that you have to set the timeout manually. It provides just enough balance to figure out what should be happening on the page and for how long. This way, it also automates repetitive code (such as waiting for buttons to appear on a page) and gives you enough control to tinker with that automation. Compared to Puppeteer, Playwright also offers better support for selecting elements.

Last but not least, Playwright stays ahead with the introduction of convenient tools like CodeGen, which can write out all performed actions in the browser as code.

Image taken from playwright.dev docs

Image taken from playwright.dev docs

Which brings us to our conclusion. Choosing between Playwright and Puppeteer depends on your use case; both of these libraries are powerful, easy to use, and can help you with your projects, be it for testing or browser automation.

However, in almost all cases, when you have to choose between the two, Playwright is the better choice. There's only one major reason to use Puppeteer instead of Playwright: if you already know Puppeteer and you already have a lot of code written in it. Otherwise, you should always pick Playwright, as it's equal or better on all fronts.

🤔 Compare web scraping done with Puppeteer and Playwright ➜

Migration from Puppeteer to Playwright

Is it easy to switch from Puppeteer to Playwright?

Even if you chose one over the other, it's not very difficult to switch to the dark side and migrate from Puppeteer to Playwright. This is because Playwright uses syntax that's very similar to its predecessor, with minor differences in construction (for example, when launching the browser). Compare for yourself:

Migration cheat sheet

Screenshot 2022-11-16 at 14.23.29.png

page.waitForNavigation and page.waitForSelector stay as they are. However, these functions will not be needed in many cases because of Playwright's upgrade in this particular direction: auto-waiting. Rather than writing out exactly how long and how to wait, Playwright will auto-wait for all the necessary UI elements to be available, and only then will it perform the requested action. How cool is that! ‌‌‌

🪜 Step-by-step walk-through on how to scrape with Playwright (with examples) ➜

Web scraping with Puppeteer and Playwright

While web testing is very much possible with Puppeteer and Playwright, these libraries also offer lots of other use cases. The ability to generate browsers (which is essential to the successful extraction of data from dynamic pages) have made these libraries, along with Selenium, incredibly popular within the web scraping community.

We at Apify are no exception to this, which is why you can find out that two of our most powerful scrapers are built on the Puppeteer and Playwright libraries. Puppeteer and Playwright scrapers provide our dev users the convenience of a web scraper boilerplate and just enough control to shape the scraper to their needs. This way, they can extract data from any website without having to build a scraper from scratch. Give these scrapers a try if you're interested, and have a free 20 minutes on your hands. Cause that's how long it's going to take to build your scraper!

Playwright and Puppeteer scrapers

See universal web scrapers

Resources‌‌

Puppeteer:

  1. Puppeteer Docs
  2. Chrome Dev Tool Protocol
  3. Scraping with Puppeteer
  4. Puppeteer Scraper‌‌
  5. Crawlee, a Node.js library for web scraping built on Puppeteer and Playwright

Playwright:

  1. Playwright Docs
  2. Migrating from Puppeteer to Playwright
  3. Scraping with Playwright
  4. Playwright Scraper