Quality

Actor Testing Best Practices

To test an Apify actor, define input/output test cases in a JSON fixture, run them with the ApifyForge test runner before every deploy, and set assertions on output shape, field counts, and error rates. The regression suite catches breaking changes by comparing current output against a saved baseline. This guide covers the full testing workflow from local validation to CI/CD integration.

By Ryan ClintonLast updated: March 27, 2026

Actors fail silently. A website changes its HTML structure, your scraper returns empty datasets, and your users find out before you do. Testing is the only defense against this. Unlike traditional software testing where you control the inputs and outputs, actor testing must account for external dependencies: websites that change, APIs that rate-limit, and data that varies between runs. This guide covers a practical testing strategy that catches real problems without creating a maintenance burden. Every technique here comes from hard-won experience managing 250+ actors in production.

Step 1: Local testing with the Apify CLI

Start by testing locally before you push anything. The Apify CLI creates a local storage/ directory that mimics the cloud environment, so you can validate output without deploying.

# Run the actor locally with a test input file
apify run --input test.json

bash

After the run completes, inspect the output in storage/datasets/default/. Each result is saved as a separate JSON file. Check these things manually on your first run:

Field presence — Does every record have all the fields you promised in your README?
Field types — Are numbers actually numbers, not strings? Are URLs valid?
Data quality — Are values populated, or are they empty strings and nulls?
Result count — Does maxResults: 5 actually return 5 results (or fewer, with a log explaining why)?

Common pitfall: The local environment does not have Apify proxies available. If your actor uses proxyConfiguration, local runs will either fail or use your direct IP. Add a graceful fallback at the top of your actor:

import { Actor } from 'apify';

Actor.main(async () => {
    const input = await Actor.getInput();

    // Gracefully handle missing proxy in local environment
    let proxyConfiguration;
    try {
        proxyConfiguration = await Actor.createProxyConfiguration(input.proxyConfig);
    } catch (error) {
        console.warn('Proxy not available (local run?). Using direct connection.');
        proxyConfiguration = undefined;
    }

    // Pass proxyConfiguration to your crawler — it handles undefined gracefully
});

javascript

Step 2: Schema validation

Your dataset_schema.json (also called .actor/dataset_schema.json) is a contract with your users. It tells them what fields to expect, what types those fields are, and which are guaranteed to be present. When your actor output does not match the schema, Apify flags it with a maintenance warning that tanks your quality score.

{
    "actorSpecification": 1,
    "fields": {
        "title": {
            "type": "string",
            "required": true,
            "description": "Product title"
        },
        "price": {
            "type": "number",
            "required": true,
            "description": "Product price in USD"
        },
        "url": {
            "type": "string",
            "required": true,
            "description": "Product page URL",
            "format": "uri"
        },
        "inStock": {
            "type": "boolean",
            "required": false,
            "description": "Whether the product is currently in stock"
        },
        "rating": {
            "type": "number",
            "required": false,
            "description": "Average customer rating (0-5)"
        }
    }
}

json

Write a validation function that checks every output record against your schema before pushing:

function validateResult(result, index) {
    const errors = [];

    if (!result.title || typeof result.title !== 'string') {
        errors.push('Result ' + index + ': missing or invalid "title"');
    }
    if (result.price === undefined || typeof result.price !== 'number') {
        errors.push('Result ' + index + ': missing or invalid "price"');
    }
    if (!result.url || !result.url.startsWith('http')) {
        errors.push('Result ' + index + ': missing or invalid "url"');
    }

    if (errors.length > 0) {
        console.warn('Validation warnings:\n' + errors.join('\n'));
        return false;
    }
    return true;
}

// Use in your actor — filter out invalid results before pushing
const validResults = results.filter((r, i) => validateResult(r, i));
await Actor.pushData(validResults);

if (validResults.length < results.length) {
    const dropped = results.length - validResults.length;
    console.warn('Dropped ' + dropped + ' invalid results');
}

javascript

Real-world tip: Websites change gradually. A field that was always present might become optional on 5% of pages, then 20%, then 50%. By validating every result and logging warnings, you catch drift early — before Apify's maintenance system catches it for you.

Step 3: Structured test cases

Move beyond ad-hoc testing by defining structured test cases. Each test case has a name, an input, and a set of assertions. This makes tests repeatable and shareable.

{
    "testCases": [
        {
            "name": "Basic keyword search",
            "input": { "keyword": "web scraping", "maxResults": 5 },
            "assertions": {
                "minResults": 3,
                "maxResults": 5,
                "requiredFields": ["title", "url", "description"],
                "fieldTypes": { "title": "string", "url": "string" },
                "maxDurationSeconds": 60
            }
        },
        {
            "name": "Empty keyword returns helpful error",
            "input": { "keyword": "", "maxResults": 5 },
            "assertions": {
                "expectError": true,
                "errorMessageContains": "keyword"
            }
        },
        {
            "name": "Max results cap is respected",
            "input": { "keyword": "test", "maxResults": 3 },
            "assertions": {
                "minResults": 1,
                "maxResults": 3,
                "requiredFields": ["title", "url"]
            }
        },
        {
            "name": "Large request does not timeout",
            "input": { "keyword": "javascript", "maxResults": 100 },
            "assertions": {
                "minResults": 50,
                "maxDurationSeconds": 300
            }
        }
    ]
}

json

Tip: Always include a negative test case — one that provides invalid input and verifies the actor returns a helpful error rather than crashing silently or returning garbage data.

Step 4: Cloud staging

Local testing catches code bugs, but cloud staging catches deployment bugs. Environment differences, missing environment variables, Docker build issues, proxy configuration differences, and network-level blocks only surface when the actor runs on Apify infrastructure. The gap between "works on my machine" and "works on Apify" is real and has burned every actor developer at least once.

# Push to Apify (builds the Docker image in the cloud)
apify push

# Run the actor in the cloud with test input
apify call --input test.json

bash

After the cloud run, verify:

Build log is clean — no warnings about deprecated packages or missing files
Run succeeded — status is SUCCEEDED, not FAILED or TIMED-OUT
Output matches local — same fields, same types, similar (not necessarily identical) values
Memory and CPU — check the run stats for unexpected spikes that indicate performance issues
Charges are correct — if PPE is enabled, verify the event count matches the result count. See the PPE Pricing guide (/learn/ppe-pricing) for pricing validation details.

Common pitfall: Actors that work locally but fail in the cloud often have hardcoded file paths, missing dependencies (devDependencies used in production code), or proxy configuration errors. The Dockerfile is the most common source of cloud-only failures.

Step 5: Regression testing

Regression testing catches changes in actor output over time. Save the output from a known-good run and compare future runs against it. You are not looking for identical output — data changes — but you are looking for structural changes: missing fields, type changes, and unexpected nulls.

import { Actor } from 'apify';

Actor.main(async () => {
    const store = await Actor.openKeyValueStore('test-baselines');
    const input = await Actor.getInput();

    const results = await runScraper(input);
    const currentFields = Object.keys(results[0] || {}).sort();

    // Load baseline fields from previous run
    const baselineFields = await store.getValue('baseline-fields');

    if (baselineFields) {
        const missingFields = baselineFields.filter(f => !currentFields.includes(f));
        const newFields = currentFields.filter(f => !baselineFields.includes(f));

        if (missingFields.length > 0) {
            console.error('REGRESSION: Missing fields: ' + missingFields.join(', '));
        }
        if (newFields.length > 0) {
            console.log('New fields detected: ' + newFields.join(', '));
        }
    }

    // Update baseline for next comparison
    await store.setValue('baseline-fields', currentFields);
    await Actor.pushData(results);
});

javascript

Step 6: Pre-push hooks and CI/CD

The most reliable testing is testing you cannot skip. Set up a pre-push Git hook that runs your schema validation and core test cases before allowing a push.

#!/bin/bash
# .git/hooks/pre-push
echo "Running actor tests before push..."

# Validate input schema is valid JSON
node -e "JSON.parse(require('fs').readFileSync('.actor/input_schema.json'))" || exit 1

# Run local tests
npm test || exit 1

echo "All tests passed. Pushing..."

bash

For teams and larger portfolios, integrate the test runner into your CI/CD pipeline. Run the full regression suite on every pull request and block merges when tests fail. The ApifyForge Deploy Guard tool can run your test cases against cloud builds and report pass/fail status. See the Managing Multiple Actors guide (/learn/managing-multiple-actors) for fleet-level CI/CD strategies.

Debugging failed runs

When a run fails, follow this diagnostic sequence:

Check the run log in the Apify Console — most failures leave a clear error message
Check the input — was the input valid JSON? Were required fields present?
Check the build — did the Docker build succeed? Are all dependencies installed?
Check the target — has the website or API you are calling changed its structure?
Check memory — did the run exceed its memory limit? Increase memory allocation or reduce batch sizes.

Real-world tip from 250+ actors: 80% of production failures fall into three buckets: the target website changed its HTML (fix the selectors), the actor ran out of memory (increase allocation or paginate), or a dependency updated with breaking changes (pin your versions in package.json). The remaining 20% are genuinely weird edge cases.

Testing checklist for every deploy

Before every apify push, run through this checklist:

[ ] Local run completes with test.json input
[ ] Output has all required fields with correct types
[ ] Input validation rejects bad input with helpful errors
[ ] No hardcoded paths or environment-specific values
[ ] package.json has all production dependencies (not just devDependencies)
[ ] Dockerfile builds cleanly
[ ] README documents all input fields and output fields
[ ] PPE charging matches the number of results delivered

Following this checklist consistently prevents the vast majority of production issues. See the Store SEO guide (/learn/store-seo) for how testing impacts your quality score.

Back to all guides

Related guides

Beginner

Getting Started with Apify Actors

To build an Apify actor, install Node.js 18+ and the Apify CLI, scaffold a project with apify create, write your logic inside Actor.main(), define an input_schema.json, and deploy with apify push. This guide walks through every step from zero to a published Apify Store listing.

Essential

Apify PPE Pricing Explained: Pay Per Event Model, Strategy, and Code Examples

Pay Per Event (PPE) is Apify's usage-based monetization model for actors on the Apify Store. Developers set a price per event (typically $0.001 to $0.50), call Actor.addChargeForEvent() in their code, and keep 80% of revenue while Apify takes 20%. This ApifyForge guide covers the 80/20 revenue split, actor.json configuration, charging code patterns, the 14-day price change rule, and pricing strategy by actor type.

Revenue

How to Monetize Your Actors

To monetize Apify actors, start with Pay Per Event pricing at $0.01-$0.25 per result, then layer on tiered pricing for power users, free-tier funnels to drive adoption, and MCP server bundles that combine multiple actors into a single subscription. ApifyForge analytics tracks revenue per actor so you know which strategies work. This guide covers each revenue model with real pricing examples.

Growth

Store SEO Optimization

Apify Store search ranks actors by title match, README keyword density, category tags, run volume, and a quality score out of 100. To rank higher, write a README that opens with a plain-language description of what the actor does, include target keywords in the first 100 words, set accurate categories in actor.json, and maintain a success rate above 95%. This guide breaks down every ranking factor and shows how ApifyForge tracks your score.

Scale

Managing Multiple Actors

To manage 10, 50, or 200+ Apify actors, use the ApifyForge fleet dashboard to monitor health, revenue, and quality scores across your entire portfolio in one view. Group actors by category, run bulk updates on pricing and metadata, set up failure alerts, and track maintenance pulse to catch stale actors before users complain. This guide covers fleet management workflows at every scale.

Essential

Cost Planning Tools: Calculator, Plan Advisor & Proxy Analyzer

How to use ApifyForge's cost planning tools to estimate actor run costs, choose the right Apify subscription plan, and pick the most cost-effective proxy type for each scraper.

Essential

AI Agent Tools: Pipeline Preflight, LLM Optimizer & Integration Templates

How to use ApifyForge's AI agent tools to debug MCP server connections, design multi-actor pipelines, optimize actor output for LLM token efficiency, and generate integration templates.

Quality

Schema Tools: Diff, Registry & Input Guard

How to use ApifyForge's schema tools to compare actor output schemas, browse the field registry, and test actor inputs before running — preventing wasted credits and broken pipelines.

Essential

Compliance Scanner, Actor Recommender & Comparisons

How to use ApifyForge's compliance risk scanner to assess legal exposure, the actor recommender to find the best tool for your task, and head-to-head comparisons to evaluate competing actors.

Quality

The ApifyForge Testing Suite

Four cloud-powered testing tools for Apify actors: Output Guard, Deploy Guard, Cloud Staging, and Regression Suite. How they work together and when to use each one.

Essential

The Complete ApifyForge Tool Suite

All 15 developer tools in one guide: testing, schema analysis, cost planning, compliance scanning, LLM optimization, pipeline building, and privacy reporting. What each tool does, when to use it, and how they work together.

Beginner

What Is an Apify Actor?

An Apify actor is a serverless cloud program that runs on the Apify platform. It accepts JSON input, executes a task (scraping, data processing, API calls, or AI tool serving), and produces structured output in datasets, key-value stores, or request queues. Actors are packaged as Docker containers and can be run via API, scheduled, or chained together.

Essential

What Are MCP Servers on Apify?

MCP (Model Context Protocol) servers are Apify actors that run in standby mode and expose tools via an HTTP endpoint for AI assistants like Claude Desktop, Cursor, and Windsurf. They connect large language models to real-world data sources -- APIs, databases, web scrapers, and intelligence feeds -- so AI agents can take actions beyond text generation.

Beginner

How to Choose the Right Apify Actor

With over 3,000 actors on the Apify Store, choosing the right one for your task requires evaluating success rates, run history, pricing, maintenance frequency, and input schema quality. This guide provides a decision framework for selecting actors based on measurable quality metrics, plus tools to automate the comparison process.

Scale

How to Manage a Large Apify Actor Portfolio

Managing 10 Apify actors is straightforward. Managing 50 requires dashboards and cost tracking. Managing 200+ demands automated regression testing, schema validation, revenue analytics, and failure alerting. This guide covers the tools, processes, and hard-won lessons from scaling an Apify actor portfolio.