Quality

Actor Testing Best Practices

Use the ApifyForge test runner and regression suite to validate actors before every deploy. Define test cases, set assertions, and integrate with CI/CD.

By Ryan ClintonLast updated: March 19, 2026

Actors fail silently. A website changes its HTML structure, your scraper returns empty datasets, and your users find out before you do. Testing is the only defense against this. Unlike traditional software testing where you control the inputs and outputs, actor testing must account for external dependencies: websites that change, APIs that rate-limit, and data that varies between runs. This guide covers a practical testing strategy that catches real problems without creating a maintenance burden. Every technique here comes from hard-won experience managing 250+ actors in production.

Step 1: Local testing with the Apify CLI

Start by testing locally before you push anything. The Apify CLI creates a local storage/ directory that mimics the cloud environment, so you can validate output without deploying.

# Run the actor locally with a test input file
apify run --input test.json
bash

After the run completes, inspect the output in storage/datasets/default/. Each result is saved as a separate JSON file. Check these things manually on your first run:

1. **Field presence** — Does every record have all the fields you promised in your README? 2. **Field types** — Are numbers actually numbers, not strings? Are URLs valid? 3. **Data quality** — Are values populated, or are they empty strings and nulls? 4. **Result count** — Does maxResults: 5 actually return 5 results (or fewer, with a log explaining why)?

**Common pitfall:** The local environment does not have Apify proxies available. If your actor uses proxyConfiguration, local runs will either fail or use your direct IP. Add a graceful fallback at the top of your actor:

import { Actor } from 'apify';

Actor.main(async () => {
    const input = await Actor.getInput();

    // Gracefully handle missing proxy in local environment
    let proxyConfiguration;
    try {
        proxyConfiguration = await Actor.createProxyConfiguration(input.proxyConfig);
    } catch (error) {
        console.warn('Proxy not available (local run?). Using direct connection.');
        proxyConfiguration = undefined;
    }

    // Pass proxyConfiguration to your crawler — it handles undefined gracefully
});
javascript

Step 2: Schema validation

Your dataset_schema.json (also called .actor/dataset_schema.json) is a contract with your users. It tells them what fields to expect, what types those fields are, and which are guaranteed to be present. When your actor output does not match the schema, Apify flags it with a maintenance warning that tanks your quality score.

{
    "actorSpecification": 1,
    "fields": {
        "title": {
            "type": "string",
            "required": true,
            "description": "Product title"
        },
        "price": {
            "type": "number",
            "required": true,
            "description": "Product price in USD"
        },
        "url": {
            "type": "string",
            "required": true,
            "description": "Product page URL",
            "format": "uri"
        },
        "inStock": {
            "type": "boolean",
            "required": false,
            "description": "Whether the product is currently in stock"
        },
        "rating": {
            "type": "number",
            "required": false,
            "description": "Average customer rating (0-5)"
        }
    }
}
json

Write a validation function that checks every output record against your schema before pushing:

function validateResult(result, index) {
    const errors = [];

    if (!result.title || typeof result.title !== 'string') {
        errors.push('Result ' + index + ': missing or invalid "title"');
    }
    if (result.price === undefined || typeof result.price !== 'number') {
        errors.push('Result ' + index + ': missing or invalid "price"');
    }
    if (!result.url || !result.url.startsWith('http')) {
        errors.push('Result ' + index + ': missing or invalid "url"');
    }

    if (errors.length > 0) {
        console.warn('Validation warnings:\n' + errors.join('\n'));
        return false;
    }
    return true;
}

// Use in your actor — filter out invalid results before pushing
const validResults = results.filter((r, i) => validateResult(r, i));
await Actor.pushData(validResults);

if (validResults.length < results.length) {
    const dropped = results.length - validResults.length;
    console.warn('Dropped ' + dropped + ' invalid results');
}
javascript

**Real-world tip:** Websites change gradually. A field that was always present might become optional on 5% of pages, then 20%, then 50%. By validating every result and logging warnings, you catch drift early — before Apify's maintenance system catches it for you.

Step 3: Structured test cases

Move beyond ad-hoc testing by defining structured test cases. Each test case has a name, an input, and a set of assertions. This makes tests repeatable and shareable.

{
    "testCases": [
        {
            "name": "Basic keyword search",
            "input": { "keyword": "web scraping", "maxResults": 5 },
            "assertions": {
                "minResults": 3,
                "maxResults": 5,
                "requiredFields": ["title", "url", "description"],
                "fieldTypes": { "title": "string", "url": "string" },
                "maxDurationSeconds": 60
            }
        },
        {
            "name": "Empty keyword returns helpful error",
            "input": { "keyword": "", "maxResults": 5 },
            "assertions": {
                "expectError": true,
                "errorMessageContains": "keyword"
            }
        },
        {
            "name": "Max results cap is respected",
            "input": { "keyword": "test", "maxResults": 3 },
            "assertions": {
                "minResults": 1,
                "maxResults": 3,
                "requiredFields": ["title", "url"]
            }
        },
        {
            "name": "Large request does not timeout",
            "input": { "keyword": "javascript", "maxResults": 100 },
            "assertions": {
                "minResults": 50,
                "maxDurationSeconds": 300
            }
        }
    ]
}
json

**Tip:** Always include a negative test case — one that provides invalid input and verifies the actor returns a helpful error rather than crashing silently or returning garbage data.

Step 4: Cloud staging

Local testing catches code bugs, but cloud staging catches deployment bugs. Environment differences, missing environment variables, Docker build issues, proxy configuration differences, and network-level blocks only surface when the actor runs on Apify infrastructure. The gap between "works on my machine" and "works on Apify" is real and has burned every actor developer at least once.

# Push to Apify (builds the Docker image in the cloud)
apify push

# Run the actor in the cloud with test input
apify call --input test.json
bash

After the cloud run, verify:

1. **Build log is clean** — no warnings about deprecated packages or missing files 2. **Run succeeded** — status is SUCCEEDED, not FAILED or TIMED-OUT 3. **Output matches local** — same fields, same types, similar (not necessarily identical) values 4. **Memory and CPU** — check the run stats for unexpected spikes that indicate performance issues 5. **Charges are correct** — if PPE is enabled, verify the event count matches the result count. See the PPE Pricing guide (/learn/ppe-pricing) for pricing validation details.

**Common pitfall:** Actors that work locally but fail in the cloud often have hardcoded file paths, missing dependencies (devDependencies used in production code), or proxy configuration errors. The Dockerfile is the most common source of cloud-only failures.

Step 5: Regression testing

Regression testing catches changes in actor output over time. Save the output from a known-good run and compare future runs against it. You are not looking for identical output — data changes — but you are looking for structural changes: missing fields, type changes, and unexpected nulls.

import { Actor } from 'apify';

Actor.main(async () => {
    const store = await Actor.openKeyValueStore('test-baselines');
    const input = await Actor.getInput();

    const results = await runScraper(input);
    const currentFields = Object.keys(results[0] || {}).sort();

    // Load baseline fields from previous run
    const baselineFields = await store.getValue('baseline-fields');

    if (baselineFields) {
        const missingFields = baselineFields.filter(f => !currentFields.includes(f));
        const newFields = currentFields.filter(f => !baselineFields.includes(f));

        if (missingFields.length > 0) {
            console.error('REGRESSION: Missing fields: ' + missingFields.join(', '));
        }
        if (newFields.length > 0) {
            console.log('New fields detected: ' + newFields.join(', '));
        }
    }

    // Update baseline for next comparison
    await store.setValue('baseline-fields', currentFields);
    await Actor.pushData(results);
});
javascript

Step 6: Pre-push hooks and CI/CD

The most reliable testing is testing you cannot skip. Set up a pre-push Git hook that runs your schema validation and core test cases before allowing a push.

#!/bin/bash
# .git/hooks/pre-push
echo "Running actor tests before push..."

# Validate input schema is valid JSON
node -e "JSON.parse(require('fs').readFileSync('.actor/input_schema.json'))" || exit 1

# Run local tests
npm test || exit 1

echo "All tests passed. Pushing..."
bash

For teams and larger portfolios, integrate the test runner into your CI/CD pipeline. Run the full regression suite on every pull request and block merges when tests fail. The ApifyForge Test Runner tool can run your test cases against cloud builds and report pass/fail status. See the Managing Multiple Actors guide (/learn/managing-multiple-actors) for fleet-level CI/CD strategies.

Debugging failed runs

When a run fails, follow this diagnostic sequence:

1. **Check the run log** in the Apify Console — most failures leave a clear error message 2. **Check the input** — was the input valid JSON? Were required fields present? 3. **Check the build** — did the Docker build succeed? Are all dependencies installed? 4. **Check the target** — has the website or API you are calling changed its structure? 5. **Check memory** — did the run exceed its memory limit? Increase memory allocation or reduce batch sizes.

**Real-world tip from 250+ actors:** 80% of production failures fall into three buckets: the target website changed its HTML (fix the selectors), the actor ran out of memory (increase allocation or paginate), or a dependency updated with breaking changes (pin your versions in package.json). The remaining 20% are genuinely weird edge cases.

Testing checklist for every deploy

Before every apify push, run through this checklist:

- [ ] Local run completes with test.json input - [ ] Output has all required fields with correct types - [ ] Input validation rejects bad input with helpful errors - [ ] No hardcoded paths or environment-specific values - [ ] package.json has all production dependencies (not just devDependencies) - [ ] Dockerfile builds cleanly - [ ] README documents all input fields and output fields - [ ] PPE charging matches the number of results delivered

Following this checklist consistently prevents the vast majority of production issues. See the Store SEO guide (/learn/store-seo) for how testing impacts your quality score.

Related guides

Beginner

Getting Started with Apify Actors

A complete walkthrough from zero to your first deployed actor. Covers project structure, Actor.main(), input schema, Dockerfile, and your first Apify Store listing.

Essential

Understanding PPE Pricing

How Pay Per Event works, how to set prices that attract users while covering costs, and common pricing mistakes that leave money on the table.

Revenue

How to Monetize Your Actors

Revenue strategies beyond basic PPE. Tiered pricing, free-tier funnels, bundling actors into MCP servers, and tracking revenue with ApifyForge analytics.

Growth

Store SEO Optimization

How Apify Store search works, what metadata matters, and how to write READMEs that rank. Includes the quality score breakdown and how ApifyForge tracks it.

Scale

Managing Multiple Actors

Fleet management strategies for 10, 50, or 200+ actors. Bulk operations, shared configs, maintenance monitoring, and the ApifyForge dashboard workflow.

Essential

Cost Planning Tools: Calculator, Plan Advisor & Proxy Analyzer

How to use ApifyForge's cost planning tools to estimate actor run costs, choose the right Apify subscription plan, and pick the most cost-effective proxy type for each scraper.

Essential

AI Agent Tools: MCP Debugger, Pipeline Builder & LLM Optimizer

How to use ApifyForge's AI agent tools to debug MCP server connections, design multi-actor pipelines, optimize actor output for LLM token efficiency, and generate integration templates.

Quality

Schema Tools: Diff, Registry & Input Tester

How to use ApifyForge's schema tools to compare actor output schemas, browse the field registry, and test actor inputs before running — preventing wasted credits and broken pipelines.

Essential

Compliance Scanner, Actor Recommender & Comparisons

How to use ApifyForge's compliance risk scanner to assess legal exposure, the actor recommender to find the best tool for your task, and head-to-head comparisons to evaluate competing actors.

Quality

The ApifyForge Testing Suite

Five cloud-powered testing tools for Apify actors: Schema Validator, Test Runner, Cloud Staging, Regression Suite, and MCP Debugger. How they work together and when to use each one.

Essential

The Complete ApifyForge Tool Suite

All 14 developer tools in one guide: testing, schema analysis, cost planning, compliance scanning, LLM optimization, and pipeline building. What each tool does, when to use it, and how they work together.