Actor Testing Best Practices
To test an Apify actor, define input/output test cases in a JSON fixture, run them with the ApifyForge test runner before every deploy, and set assertions on output shape, field counts, and error rates. The regression suite catches breaking changes by comparing current output against a saved baseline. This guide covers the full testing workflow from local validation to CI/CD integration.
Actors fail silently. A website changes its HTML structure, your scraper returns empty datasets, and your users find out before you do. Testing is the only defense against this. Unlike traditional software testing where you control the inputs and outputs, actor testing must account for external dependencies: websites that change, APIs that rate-limit, and data that varies between runs. This guide covers a practical testing strategy that catches real problems without creating a maintenance burden. Every technique here comes from hard-won experience managing 250+ actors in production.
Step 1: Local testing with the Apify CLI
Start by testing locally before you push anything. The Apify CLI creates a local storage/ directory that mimics the cloud environment, so you can validate output without deploying.
# Run the actor locally with a test input file
apify run --input test.jsonAfter the run completes, inspect the output in storage/datasets/default/. Each result is saved as a separate JSON file. Check these things manually on your first run:
- Field presence — Does every record have all the fields you promised in your README?
- Field types — Are numbers actually numbers, not strings? Are URLs valid?
- Data quality — Are values populated, or are they empty strings and nulls?
- Result count — Does
maxResults: 5actually return 5 results (or fewer, with a log explaining why)?
Common pitfall: The local environment does not have Apify proxies available. If your actor uses proxyConfiguration, local runs will either fail or use your direct IP. Add a graceful fallback at the top of your actor:
import { Actor } from 'apify';
Actor.main(async () => {
const input = await Actor.getInput();
// Gracefully handle missing proxy in local environment
let proxyConfiguration;
try {
proxyConfiguration = await Actor.createProxyConfiguration(input.proxyConfig);
} catch (error) {
console.warn('Proxy not available (local run?). Using direct connection.');
proxyConfiguration = undefined;
}
// Pass proxyConfiguration to your crawler — it handles undefined gracefully
});Step 2: Schema validation
Your dataset_schema.json (also called .actor/dataset_schema.json) is a contract with your users. It tells them what fields to expect, what types those fields are, and which are guaranteed to be present. When your actor output does not match the schema, Apify flags it with a maintenance warning that tanks your quality score.
{
"actorSpecification": 1,
"fields": {
"title": {
"type": "string",
"required": true,
"description": "Product title"
},
"price": {
"type": "number",
"required": true,
"description": "Product price in USD"
},
"url": {
"type": "string",
"required": true,
"description": "Product page URL",
"format": "uri"
},
"inStock": {
"type": "boolean",
"required": false,
"description": "Whether the product is currently in stock"
},
"rating": {
"type": "number",
"required": false,
"description": "Average customer rating (0-5)"
}
}
}Write a validation function that checks every output record against your schema before pushing:
function validateResult(result, index) {
const errors = [];
if (!result.title || typeof result.title !== 'string') {
errors.push('Result ' + index + ': missing or invalid "title"');
}
if (result.price === undefined || typeof result.price !== 'number') {
errors.push('Result ' + index + ': missing or invalid "price"');
}
if (!result.url || !result.url.startsWith('http')) {
errors.push('Result ' + index + ': missing or invalid "url"');
}
if (errors.length > 0) {
console.warn('Validation warnings:\n' + errors.join('\n'));
return false;
}
return true;
}
// Use in your actor — filter out invalid results before pushing
const validResults = results.filter((r, i) => validateResult(r, i));
await Actor.pushData(validResults);
if (validResults.length < results.length) {
const dropped = results.length - validResults.length;
console.warn('Dropped ' + dropped + ' invalid results');
}Real-world tip: Websites change gradually. A field that was always present might become optional on 5% of pages, then 20%, then 50%. By validating every result and logging warnings, you catch drift early — before Apify's maintenance system catches it for you.
Step 3: Structured test cases
Move beyond ad-hoc testing by defining structured test cases. Each test case has a name, an input, and a set of assertions. This makes tests repeatable and shareable.
{
"testCases": [
{
"name": "Basic keyword search",
"input": { "keyword": "web scraping", "maxResults": 5 },
"assertions": {
"minResults": 3,
"maxResults": 5,
"requiredFields": ["title", "url", "description"],
"fieldTypes": { "title": "string", "url": "string" },
"maxDurationSeconds": 60
}
},
{
"name": "Empty keyword returns helpful error",
"input": { "keyword": "", "maxResults": 5 },
"assertions": {
"expectError": true,
"errorMessageContains": "keyword"
}
},
{
"name": "Max results cap is respected",
"input": { "keyword": "test", "maxResults": 3 },
"assertions": {
"minResults": 1,
"maxResults": 3,
"requiredFields": ["title", "url"]
}
},
{
"name": "Large request does not timeout",
"input": { "keyword": "javascript", "maxResults": 100 },
"assertions": {
"minResults": 50,
"maxDurationSeconds": 300
}
}
]
}Tip: Always include a negative test case — one that provides invalid input and verifies the actor returns a helpful error rather than crashing silently or returning garbage data.
Step 4: Cloud staging
Local testing catches code bugs, but cloud staging catches deployment bugs. Environment differences, missing environment variables, Docker build issues, proxy configuration differences, and network-level blocks only surface when the actor runs on Apify infrastructure. The gap between "works on my machine" and "works on Apify" is real and has burned every actor developer at least once.
# Push to Apify (builds the Docker image in the cloud)
apify push
# Run the actor in the cloud with test input
apify call --input test.jsonAfter the cloud run, verify:
- Build log is clean — no warnings about deprecated packages or missing files
- Run succeeded — status is SUCCEEDED, not FAILED or TIMED-OUT
- Output matches local — same fields, same types, similar (not necessarily identical) values
- Memory and CPU — check the run stats for unexpected spikes that indicate performance issues
- Charges are correct — if PPE is enabled, verify the event count matches the result count. See the PPE Pricing guide (/learn/ppe-pricing) for pricing validation details.
Common pitfall: Actors that work locally but fail in the cloud often have hardcoded file paths, missing dependencies (devDependencies used in production code), or proxy configuration errors. The Dockerfile is the most common source of cloud-only failures.
Step 5: Regression testing
Regression testing catches changes in actor output over time. Save the output from a known-good run and compare future runs against it. You are not looking for identical output — data changes — but you are looking for structural changes: missing fields, type changes, and unexpected nulls.
import { Actor } from 'apify';
Actor.main(async () => {
const store = await Actor.openKeyValueStore('test-baselines');
const input = await Actor.getInput();
const results = await runScraper(input);
const currentFields = Object.keys(results[0] || {}).sort();
// Load baseline fields from previous run
const baselineFields = await store.getValue('baseline-fields');
if (baselineFields) {
const missingFields = baselineFields.filter(f => !currentFields.includes(f));
const newFields = currentFields.filter(f => !baselineFields.includes(f));
if (missingFields.length > 0) {
console.error('REGRESSION: Missing fields: ' + missingFields.join(', '));
}
if (newFields.length > 0) {
console.log('New fields detected: ' + newFields.join(', '));
}
}
// Update baseline for next comparison
await store.setValue('baseline-fields', currentFields);
await Actor.pushData(results);
});Step 6: Pre-push hooks and CI/CD
The most reliable testing is testing you cannot skip. Set up a pre-push Git hook that runs your schema validation and core test cases before allowing a push.
#!/bin/bash
# .git/hooks/pre-push
echo "Running actor tests before push..."
# Validate input schema is valid JSON
node -e "JSON.parse(require('fs').readFileSync('.actor/input_schema.json'))" || exit 1
# Run local tests
npm test || exit 1
echo "All tests passed. Pushing..."For teams and larger portfolios, integrate the test runner into your CI/CD pipeline. Run the full regression suite on every pull request and block merges when tests fail. The ApifyForge Deploy Guard tool can run your test cases against cloud builds and report pass/fail status. See the Managing Multiple Actors guide (/learn/managing-multiple-actors) for fleet-level CI/CD strategies.
Debugging failed runs
When a run fails, follow this diagnostic sequence:
- Check the run log in the Apify Console — most failures leave a clear error message
- Check the input — was the input valid JSON? Were required fields present?
- Check the build — did the Docker build succeed? Are all dependencies installed?
- Check the target — has the website or API you are calling changed its structure?
- Check memory — did the run exceed its memory limit? Increase memory allocation or reduce batch sizes.
Real-world tip from 250+ actors: 80% of production failures fall into three buckets: the target website changed its HTML (fix the selectors), the actor ran out of memory (increase allocation or paginate), or a dependency updated with breaking changes (pin your versions in package.json). The remaining 20% are genuinely weird edge cases.
Testing checklist for every deploy
Before every apify push, run through this checklist:
- [ ] Local run completes with test.json input
- [ ] Output has all required fields with correct types
- [ ] Input validation rejects bad input with helpful errors
- [ ] No hardcoded paths or environment-specific values
- [ ] package.json has all production dependencies (not just devDependencies)
- [ ] Dockerfile builds cleanly
- [ ] README documents all input fields and output fields
- [ ] PPE charging matches the number of results delivered
Following this checklist consistently prevents the vast majority of production issues. See the Store SEO guide (/learn/store-seo) for how testing impacts your quality score.
Related guides
Getting Started with Apify Actors
To build an Apify actor, install Node.js 18+ and the Apify CLI, scaffold a project with apify create, write your logic inside Actor.main(), define an input_schema.json, and deploy with apify push. This guide walks through every step from zero to a published Apify Store listing.
Apify PPE Pricing Explained: Pay Per Event Model, Strategy, and Code Examples
Pay Per Event (PPE) is Apify's usage-based monetization model for actors on the Apify Store. Developers set a price per event (typically $0.001 to $0.50), call Actor.addChargeForEvent() in their code, and keep 80% of revenue while Apify takes 20%. This ApifyForge guide covers the 80/20 revenue split, actor.json configuration, charging code patterns, the 14-day price change rule, and pricing strategy by actor type.
How to Monetize Your Actors
To monetize Apify actors, start with Pay Per Event pricing at $0.01-$0.25 per result, then layer on tiered pricing for power users, free-tier funnels to drive adoption, and MCP server bundles that combine multiple actors into a single subscription. ApifyForge analytics tracks revenue per actor so you know which strategies work. This guide covers each revenue model with real pricing examples.
Store SEO Optimization
Apify Store search ranks actors by title match, README keyword density, category tags, run volume, and a quality score out of 100. To rank higher, write a README that opens with a plain-language description of what the actor does, include target keywords in the first 100 words, set accurate categories in actor.json, and maintain a success rate above 95%. This guide breaks down every ranking factor and shows how ApifyForge tracks your score.
Managing Multiple Actors
To manage 10, 50, or 200+ Apify actors, use the ApifyForge fleet dashboard to monitor health, revenue, and quality scores across your entire portfolio in one view. Group actors by category, run bulk updates on pricing and metadata, set up failure alerts, and track maintenance pulse to catch stale actors before users complain. This guide covers fleet management workflows at every scale.
Cost Planning Tools: Calculator, Plan Advisor & Proxy Analyzer
How to use ApifyForge's cost planning tools to estimate actor run costs, choose the right Apify subscription plan, and pick the most cost-effective proxy type for each scraper.
AI Agent Tools: Pipeline Preflight, LLM Optimizer & Integration Templates
How to use ApifyForge's AI agent tools to debug MCP server connections, design multi-actor pipelines, optimize actor output for LLM token efficiency, and generate integration templates.
Schema Tools: Diff, Registry & Input Guard
How to use ApifyForge's schema tools to compare actor output schemas, browse the field registry, and test actor inputs before running — preventing wasted credits and broken pipelines.
Compliance Scanner, Actor Recommender & Comparisons
How to use ApifyForge's compliance risk scanner to assess legal exposure, the actor recommender to find the best tool for your task, and head-to-head comparisons to evaluate competing actors.
The ApifyForge Testing Suite
Four cloud-powered testing tools for Apify actors: Output Guard, Deploy Guard, Cloud Staging, and Regression Suite. How they work together and when to use each one.
The Complete ApifyForge Tool Suite
All 15 developer tools in one guide: testing, schema analysis, cost planning, compliance scanning, LLM optimization, pipeline building, and privacy reporting. What each tool does, when to use it, and how they work together.
What Is an Apify Actor?
An Apify actor is a serverless cloud program that runs on the Apify platform. It accepts JSON input, executes a task (scraping, data processing, API calls, or AI tool serving), and produces structured output in datasets, key-value stores, or request queues. Actors are packaged as Docker containers and can be run via API, scheduled, or chained together.
What Are MCP Servers on Apify?
MCP (Model Context Protocol) servers are Apify actors that run in standby mode and expose tools via an HTTP endpoint for AI assistants like Claude Desktop, Cursor, and Windsurf. They connect large language models to real-world data sources -- APIs, databases, web scrapers, and intelligence feeds -- so AI agents can take actions beyond text generation.
How to Choose the Right Apify Actor
With over 3,000 actors on the Apify Store, choosing the right one for your task requires evaluating success rates, run history, pricing, maintenance frequency, and input schema quality. This guide provides a decision framework for selecting actors based on measurable quality metrics, plus tools to automate the comparison process.
How to Manage a Large Apify Actor Portfolio
Managing 10 Apify actors is straightforward. Managing 50 requires dashboards and cost tracking. Managing 200+ demands automated regression testing, schema validation, revenue analytics, and failure alerting. This guide covers the tools, processes, and hard-won lessons from scaling an Apify actor portfolio.