Request Queue
An Apify Request Queue is a managed, persistent queue system that stores and prioritizes URLs to be crawled by a web scraping actor. It is the backbone of any Crawlee-based crawler, handling URL deduplication (so you never crawl the same page twice), automatic retries for failed requests (with configurable retry counts and exponential backoff), and FIFO ordering for predictable crawl behavior. When your crawler discovers new links on a page, it adds them to the request queue, and the queue feeds them back to the crawler in order. Request queues matter because managing crawl state is one of the hardest problems in web scraping at scale. Without a request queue, you would need to manually track which URLs have been visited, handle retries for transient failures (network timeouts, 503 errors, rate limiting), and implement persistence so that a crashed crawl can resume instead of starting over. The Apify Request Queue solves all of these problems out of the box, allowing you to focus on extraction logic rather than infrastructure. To use a request queue with Crawlee: const crawler = new CheerioCrawler({ requestHandler: async ({ $, request, enqueueLinks }) => { const data = { title: $('h1').text(), url: request.url }; await Actor.pushData(data); await enqueueLinks({ selector: 'a.next-page' }); } }); await crawler.addRequests([{ url: 'https://example.com/page/1' }]); await crawler.run(); The enqueueLinks helper automatically adds discovered URLs to the request queue with deduplication. You can also add URLs manually: await crawler.requestQueue.addRequest({ url: 'https://example.com/specific-page', userData: { category: 'electronics' } }). For direct queue manipulation without Crawlee: const queue = await Actor.openRequestQueue(); await queue.addRequest({ url: 'https://example.com', uniqueKey: 'home-page' }); const request = await queue.fetchNextRequest(); await queue.markRequestHandled(request); The uniqueKey property controls deduplication — requests with the same uniqueKey are treated as duplicates and silently ignored. Common mistakes include not setting maxRequestRetries appropriately. The default is 3 retries, which works for most sites, but some targets have aggressive rate limiting that requires 5-10 retries with longer backoff. Set this in the crawler options: new CheerioCrawler({ maxRequestRetries: 5 }). Another mistake is not using the forefront option to prioritize important URLs. By default, new requests go to the back of the queue (FIFO). If you discover a high-value URL mid-crawl that should be processed immediately, use await queue.addRequest({ url }, { forefront: true }) to push it to the front. A critical advantage of Apify Request Queues is persistence across actor restarts. If your actor crashes, times out, or is manually stopped mid-crawl, the queue remembers exactly which URLs were already processed and which are still pending. When you restart the actor (or enable automatic restart on failure), the crawler picks up exactly where it left off. This is essential for large crawls that process hundreds of thousands of pages and may take hours to complete — without persistence, a crash at 90% completion would mean starting over from scratch. Request queues also support request locking for parallel processing: when multiple crawler instances run simultaneously (using Apify's autoscaling), the queue ensures each URL is assigned to exactly one crawler instance, preventing duplicate processing. Monitor queue statistics (total requests, handled, pending, retries) via the API at GET /v2/request-queues/{queueId} or in the Apify Console. Related concepts: Crawlee, Cheerio Crawler, Playwright Crawler, Proxy, Actor Run.