Playwright Crawler

Playwright Crawler (PlaywrightCrawler) is a Crawlee crawler type that launches a real Chromium, Firefox, or WebKit browser via Microsoft's Playwright library to render web pages with full JavaScript execution. PlaywrightCrawler handles everything that CheerioCrawler cannot: single-page applications built with React, Vue, Angular, or Svelte; dynamically loaded content triggered by scrolling, clicking, or AJAX requests; pages requiring login, cookie consent, or multi-step form interactions; and sites with JavaScript-based anti-bot challenges that detect headless browsers. PlaywrightCrawler matters because the modern web is increasingly JavaScript-heavy. Major platforms like LinkedIn, Twitter/X, Instagram, Airbnb, and many e-commerce sites render content entirely through JavaScript frameworks. Without a real browser, there is simply no way to access this content. PlaywrightCrawler gives your actor the same rendering capabilities as a real user's browser, running JavaScript, loading AJAX data, handling redirects, and rendering the final DOM for extraction. The tradeoff is resource consumption. A PlaywrightCrawler actor typically uses 1-4 GB of memory (compared to 256-512 MB for CheerioCrawler) and processes 1-5 pages per second (compared to 10-50 for Cheerio). This means 5-10x more expensive compute costs per page. For a 10,000-page crawl at 4 GB memory taking 1 hour, you are looking at approximately 4 compute units ($1.00) versus 0.1 CU ($0.025) for the same crawl with CheerioCrawler. These costs compound at scale — a daily crawl of 100,000 pages adds up to $30/month with Playwright versus $0.75/month with Cheerio. To build a PlaywrightCrawler actor: import { PlaywrightCrawler } from 'crawlee'; import { Actor } from 'apify'; Actor.main(async () => { const crawler = new PlaywrightCrawler({ requestHandler: async ({ page, request, enqueueLinks }) => { await page.waitForSelector('div.product-card', { timeout: 30000 }); const products = await page.$$eval('div.product-card', (cards) => cards.map((card) => ({ title: card.querySelector('h2')?.textContent?.trim(), price: card.querySelector('.price')?.textContent?.trim() }))); await Actor.pushData(products); await enqueueLinks({ selector: 'a.next-page' }); }, headless: true, maxConcurrency: 5, navigationTimeoutSecs: 60 }); await crawler.addRequests([{ url: 'https://example.com/products' }]); await crawler.run(); }); The page parameter is a full Playwright Page object with access to all Playwright APIs: page.click(), page.fill(), page.waitForSelector(), page.screenshot(), and more. Common mistakes with PlaywrightCrawler include using it by default for all scraping. Always try CheerioCrawler first — you will often be surprised how much data is available in the raw HTML even on seemingly JavaScript-heavy sites. Only upgrade to Playwright if data is genuinely missing from the initial HTML response. Another mistake is not setting navigationTimeoutSecs appropriately. The default may be too short for slow-loading SPAs that make multiple AJAX calls before rendering content. Set it to 60 seconds for unreliable targets. Not blocking unnecessary resources is a costly oversight. Most scraping tasks do not need images, CSS, fonts, or tracking scripts. Use page.route() to block them: await page.route('**/*.{png,jpg,jpeg,gif,css,woff,woff2}', (route) => route.abort()); This can reduce page load time by 50-70% and cut bandwidth (and proxy costs) significantly. Another advanced optimization is using page.waitForSelector() instead of page.waitForTimeout(). Waiting a fixed number of milliseconds is fragile — the page might load faster (wasting time) or slower (missing data). Always wait for the specific DOM element that contains your target data. For actors that scrape both listing pages and detail pages, consider a hybrid approach: use CheerioCrawler for server-rendered listing pages and PlaywrightCrawler only for detail pages that require JavaScript. Crawlee supports this with router handlers and request labels, giving you the cost savings of Cheerio where possible and the rendering power of Playwright where necessary. Related concepts: Cheerio Crawler, Crawlee, Proxy, Compute Unit, Request Queue, Actor.

Related Terms