My actor is getting blocked or returning 403 errors

HTTP 403 (Forbidden) and 429 (Too Many Requests) errors mean the target website's anti-bot system has detected your scraper and is blocking its requests. This is one of the most common issues in web scraping, and the fix depends on what anti-bot measures the site uses and what your current setup looks like.

Here is a layered approach to resolving blocking issues, starting with the simplest fixes and escalating to more advanced techniques. Layer one — add proxies if you are not using them. Scraping without proxies means all your requests come from a single IP address, which is trivial for websites to detect and block. Datacenter proxies rotate your IP across a pool of addresses and work for most basic websites. Configure proxies in your Crawlee crawler with the proxyConfiguration option. Layer two — upgrade to residential proxies. If you are already using datacenter proxies but still getting blocked, the target site likely checks whether the IP belongs to a datacenter or a residential ISP. Residential proxies route your traffic through real residential IP addresses, which are much harder for anti-bot systems to distinguish from regular users. Sites like Amazon, LinkedIn, Google, Zillow, and most social media platforms require residential proxies. Layer three — switch from CheerioCrawler to PlaywrightCrawler or PuppeteerCrawler. Cheerio makes raw HTTP requests without executing JavaScript, which is fast but does not pass browser fingerprint checks. Modern anti-bot systems like Cloudflare, PerimeterX, and DataDome check for JavaScript execution, browser APIs, canvas fingerprints, and WebGL rendering. PlaywrightCrawler runs a real browser that passes these checks.

Additional techniques to layer on top of proxies and browser crawling include reducing your request rate with the maxRequestsPerMinute Crawlee option (start with 20-30 requests per minute and adjust based on results), adding random delays between requests using the Crawlee autoscaling configuration, rotating User-Agent headers to mimic different browsers, setting realistic viewport sizes and screen resolutions in Playwright, and handling CAPTCHAs with third-party solving services when they appear.

Some particularly aggressive websites require all of these measures simultaneously. If you are still getting blocked after implementing everything above, the target site may be using advanced bot detection that requires session management, cookie handling, or specialized scraping approaches. For related debugging advice, see the questions about how to debug failed actor runs and what happens when an actor fails.

My actor is getting blocked or returning 403 errors

Related term

Related questions