Proxy

An Apify Proxy is an intermediary server that routes your actor's web requests through different IP addresses, preventing target websites from identifying and blocking your scraper based on IP-based rate limiting or geo-restrictions. Proxy rotation is essential for any serious web scraping operation because most websites monitor incoming request patterns and will block or CAPTCHA any single IP address that sends more than a few dozen requests in a short period. Apify offers three proxy tiers, each designed for different anti-bot resistance levels. Datacenter proxies are the cheapest and fastest option (starting at approximately $0.10 per GB), using IP addresses from cloud data centers. They work well for sites without anti-bot protection — internal tools, APIs, small business websites, and content sites that do not invest in bot detection. Residential proxies use real IP addresses from consumer ISP connections, making them indistinguishable from regular home users. They cost more (approximately $1-12 per GB depending on geography) but are essential for sites with aggressive anti-bot systems like Amazon, LinkedIn, Google, Zillow, and major social media platforms. SERP proxies are specialized for search engine result pages (Google, Bing, DuckDuckGo), handling the unique challenges of scraping search engines at scale including CAPTCHA solving and result page parsing. To configure proxies in a Crawlee-based actor: import { ProxyConfiguration } from 'crawlee'; const proxyConfiguration = new ProxyConfiguration({ groups: ['RESIDENTIAL'], countryCode: 'US' }); const crawler = new CheerioCrawler({ proxyConfiguration, requestHandler: async ({ $, request }) => { ... } }); The groups parameter selects the proxy tier, and countryCode targets a specific geography (useful for geo-restricted content). Crawlee automatically rotates IPs on every request and handles proxy authentication behind the scenes. For manual proxy usage outside Crawlee: const proxyUrl = await Actor.createProxyUrl({ groups: ['RESIDENTIAL'] }); You can also use Apify Proxy as a standard HTTP proxy with the URL format http://auto:[email protected]:8000. Common mistakes with proxies include using datacenter proxies on sites that require residential IPs. Your actor will fail with 403 Forbidden errors, CAPTCHA pages, or empty responses, and you will waste compute credits debugging what appears to be a code problem but is actually a proxy problem. Always check the target site's anti-bot level before choosing a proxy tier. Another mistake is not enabling session management with residential proxies. Apify Proxy supports sticky sessions where the same IP is reused for a sequence of requests (e.g., login then navigate), configured with sessionId in the proxy URL. Without sessions, each request gets a random IP, which breaks stateful interactions like authentication flows. Proxy costs are billed per GB of data transferred through the proxy, separate from compute unit charges. A typical scraping operation transfers 0.5-5 MB per page depending on content. Scraping 10,000 pages at 2 MB average through residential proxies costs approximately $20-240 depending on the proxy tier and geography. Monitor proxy usage in the Apify Console under Usage to track data transfer and costs per actor. When choosing between proxy tiers, start with datacenter proxies and upgrade only if you encounter blocks. This saves significant cost — residential proxies are 10-100x more expensive per GB than datacenter proxies. For sites you know require residential IPs (the major platforms listed above), skip the trial and go straight to residential to avoid wasting compute on failed runs. Related concepts: Cheerio Crawler, Playwright Crawler, Crawlee, Compute Unit, Actor Run.

Related Terms