Web ScrapingDeveloper ToolsApifyPPE PricingMCP Servers

How to Test Apify Actors Before Publishing (5-Level Workflow)

The testing workflow I use across Apify actors to avoid maintenance flags. Five levels, from local runs to pre-push hooks.

Ryan Clinton

The problem: Most Apify actors get a maintenance flag within days of publishing because the developer ran it once locally, saw JSON output, and assumed it would work on the platform. Environmental differences between local machines and Apify's Docker containers — Node.js versions, memory limits, proxy routing — silently break actors that passed every local test.

ApifyForge's five-level testing workflow eliminates maintenance flags across a Apify actor portfolio. The workflow covers local runs with default inputs, input schema validation, automated test suites with Jest, cloud staging on the actual Apify platform, and pre-push hooks that block bad deploys. Before implementing this system, ApifyForge averaged 2-3 maintenance flags per week; after, months passed without a single flag. The ApifyForge Schema Validator and Test Runner tools automate the most error-prone steps for free.

Key takeaways:

  • Always test with default inputs first — Apify's automated health checks use your schema defaults, and failure here guarantees a maintenance flag
  • Separate business logic from Actor.main() so you can unit test scraping and parsing functions without mocking the Apify environment
  • Cloud staging is mandatory — local tests cannot catch Docker build failures, memory limit violations, or proxy routing issues
  • Pre-push hooks catch roughly 12% of issues that would otherwise ship to production
  • Validate output schema on every run, not just input schema — "successful" runs with missing fields are worse than crashes

Testing an Apify actor before publishing requires a five-level workflow: local runs, schema validation, automated test suites, cloud staging, and pre-push hooks. ApifyForge provides free tools for each level, and this guide covers the exact process used to maintain Apify actors with near-zero maintenance flags on the Apify Store.

I watch this happen constantly. A developer builds an actor, runs it once, sees JSON in the terminal, pushes to the Store, and walks away. Two days later: maintenance flag.

The gap between "works on my machine" and "works on Apify" is bigger than you'd think. Node.js version differences, missing environment variables, Docker memory limits, proxy routing quirks, input schema edge cases — any of these will break an actor that ran fine locally. A 2024 study from the IEEE International Conference on Software Testing found that 68% of deployment failures stem from environmental differences between development and production (IEEE ICST 2024). That number tracks with what I've seen firsthand.

I manage many actors on the Apify Store. Before I built a systematic testing workflow, ApifyForge was eating 2-3 maintenance flags per week. After implementing the five-level approach below, I went months without a single flag. Here's the exact process.

What Does Testing an Apify Actor Actually Involve?

Testing an Apify actor means validating that it runs successfully with default inputs, handles edge cases without crashing, produces schema-conforming output, and behaves identically on the Apify platform as it does locally. It's more than "run it once" — it's a five-level process covering local execution, schema validation, automated tests, cloud staging, and pre-push hooks.

That definition sounds formal. In practice, it means: can this thing survive contact with real users who will send it empty inputs, Unicode strings, and URLs that 404? If yes, publish. If no, keep testing.

Most Apify actors get flagged not because they have bugs in the happy path. They get flagged because nobody tested the unhappy paths. According to Apify's Store documentation, actors need to maintain above 95% success rates for preferential search ranking (Apify docs). One bad edge case that crashes 10% of runs will quietly bury your actor in search results — and kill your PPE revenue.

How Do You Test an Apify Actor Locally?

Run apify run with your default inputs. This simulates the Apify platform environment locally, loads your input schema, reads from storage/key_value_stores/default/INPUT.json, and stores output in the local storage/ directory. It's the fastest way to catch obvious crashes.

Here's what I actually check after every local run:

Exit code. Did it exit cleanly? Non-zero means a crash. This is pass/fail — there's no "well, it mostly worked."

Output shape. Check storage/datasets/default/ for results. Are all expected fields present? Are types correct? If your actor promises a phone field and it's returning null for 40% of results, users will notice before you do.

Memory. Even locally, watch for memory creep. Add this to your actor:

function logMemory(label) {
    const usage = process.memoryUsage();
    log.info(`Memory [${label}]: RSS=${Math.round(usage.rss / 1048576)}MB`);
}

If RSS grows continuously without leveling off, you've got a leak. Fix it before publishing — memory-related crashes on the platform are the hardest bugs for users to diagnose, and they generate the angriest support messages.

The Most Important Test You Can Run

Test with default inputs. Full stop.

Apify uses your default inputs for automated health checks. If your actor can't run with the defaults defined in your input schema, it will get flagged. I learned this the hard way across my first 50 actors. Now it's the very first test I run for every actor in the ApifyForge portfolio.

Create storage/key_value_stores/default/INPUT.json with values that exactly match your schema defaults. Run apify run with no arguments. This is what Apify's health checker does. If it fails here, nothing else matters.

Edge Cases That Break Production Actors

Beyond defaults, test these. They're where real failures hide:

## Empty input — must not crash
apify run --input='{}'

## Only required fields
apify run --input='{"url": "https://example.com"}'

## Invalid URL — should exit gracefully, not throw
apify run --input='{"url": "not-a-url"}'

## Unicode — surprisingly common failure point
apify run --input='{"query": "café reseña"}'

Each should produce exit code 0. The actor doesn't have to return results for every case, but it must never crash. Google's Site Reliability Engineering handbook puts it well: "A system that crashes is worse than a system that returns an error" (Google SRE Book).

Why Do Actors Pass Local Testing But Fail on the Platform?

The Apify platform runs actors inside Docker containers with strict memory limits, specific Node.js versions, and network configurations that differ from your local machine. An actor using 512MB locally will crash if you selected the 256MB memory tier. Native npm modules that compile fine on macOS may fail in the platform's Linux containers.

This is the #1 reason cloud staging exists. I'll get to that in a minute.

But first — your input schema. It's the single biggest source of "works locally, breaks on platform" bugs.

Schema Issues That Silently Break Actors

Missing defaults on required fields. If your schema requires a field but doesn't provide a default, Apify's health check can't run your actor:

{
    "properties": {
        "query": {
            "title": "Search Query",
            "type": "string"
        }
    },
    "required": ["query"]
}

Fix: add "default": "web scraping" and "prefill": "web scraping". Always.

Type mismatches. A default of "10" (string) on a field typed as integer works in some environments, breaks in others. Use 10, not "10".

Case-sensitive enums. Default "JSON" doesn't match enum ["json", "csv", "xml"]. Case matters. I've seen this exact bug ship to production more times than I'd like to admit.

The Schema Validator on ApifyForge catches all of these automatically, including Apify-specific rules that generic JSON Schema validators miss. I run it on every actor before every push — across all Apify actors, that adds up to thousands of validations per month.

How Should You Structure Actor Code for Testability?

Separate your business logic from the Apify SDK. Put scraping, parsing, and data transformation into standalone functions that don't depend on Actor.main(). This lets you write unit tests without mocking the entire Apify environment.

Hard to test:

await Actor.main(async () => {
    const input = await Actor.getInput();
    const response = await fetch(input.url);
    const html = await response.text();
    const $ = cheerio.load(html);
    const results = [];
    $('h2').each((i, el) => {
        results.push({ title: $(el).text(), index: i });
    });
    await Actor.pushData(results);
});

Everything is crammed into Actor.main(). You can't test parseHeadings without spinning up the full Apify environment.

Easy to test:

// scraper.js — pure functions, no Actor dependency
export function parseHeadings(html) {
    const $ = cheerio.load(html);
    const results = [];
    $('h2').each((i, el) => {
        results.push({ title: $(el).text().trim(), index: i });
    });
    return results;
}
// main.js — thin orchestration layer
import { Actor, log } from 'apify';
import { parseHeadings } from './scraper.js';

await Actor.main(async () => {
    const input = await Actor.getInput() || {};
    const url = input.url || 'https://example.com';
    const html = await (await fetch(url)).text();
    const results = parseHeadings(html);
    await Actor.pushData(results);
    log.info(`Found ${results.length} headings`);
});

Now you can test parseHeadings with plain Jest, no mocks needed. Martin Fowler's "Inversion of Control" principle applies directly here — push dependencies to the edges, keep the core logic pure (martinfowler.com).

This pattern matters even more when you're building actors that feed into MCP servers. An MCP server that wraps a poorly-tested actor inherits all its bugs. ApifyForge runs MCP intelligence servers, and every single one depends on tested, reliable actors underneath.

Writing Tests That Actually Catch Bugs

Here's a stripped-down Jest suite. Notice it's not testing every possible scenario — it's testing the scenarios that actually break in production:

import { parseHeadings } from './scraper.js';

describe('parseHeadings', () => {
    test('extracts h2 elements from valid HTML', () => {
        const html = '<h2>First</h2><h2>Second</h2>';
        const results = parseHeadings(html);
        expect(results).toHaveLength(2);
        expect(results[0].title).toBe('First');
    });

    test('returns empty array for no h2 elements', () => {
        expect(parseHeadings('<h1>Only h1</h1>')).toEqual([]);
    });

    test('handles empty string', () => {
        expect(parseHeadings('')).toEqual([]);
    });

    test('trims whitespace', () => {
        const results = parseHeadings('<h2>  Spaced  </h2>');
        expect(results[0].title).toBe('Spaced');
    });

    test('handles malformed HTML without throwing', () => {
        expect(() => parseHeadings('<h2>Unclosed')).not.toThrow();
    });
});

Five tests. Covers the happy path, the empty path, whitespace, and malformed input. That's it. I don't write 30 tests per function — I write the 5 that catch the bugs I've actually seen across Apify actors.

Validating Output Shape

If your actor promises certain fields in its dataset schema, enforce that in tests. Users build integrations on top of your output structure. When a field disappears, their pipelines break silently:

const REQUIRED_FIELDS = ['title', 'url', 'timestamp'];

function validateOutput(items) {
    return items.flatMap((item, i) =>
        REQUIRED_FIELDS
            .filter(f => item[f] === undefined)
            .map(f => `Item ${i}: missing "${f}"`)
    );
}

The Test Runner on ApifyForge automates output validation against your declared schema. But even a hand-rolled check like the one above catches the most common failure: a field that exists in local test data but is missing from real-world scrapes.

What Is Cloud Staging and Why Is It Necessary?

Cloud staging means running your actor on the actual Apify platform — inside a Docker container with real memory limits, real proxies, and real storage APIs — before making it public. Local testing can't catch Docker build failures, memory limit violations, or proxy routing issues that only appear in production.

The process is simple:

apify push

Then trigger a run with default inputs and check five things:

  1. Build succeeds. Common failure: native npm modules that compile on macOS but fail in the platform's Linux containers.
  2. Memory stays within tier. If you selected 256MB and the run peaks at 300MB, it'll get OOM-killed (exit code 137).
  3. Execution time is reasonable. Long runs cost users money. I target under 60 seconds for default inputs.
  4. Proxy works. If your actor uses Apify proxies, verify residential vs. datacenter routing. This literally cannot be tested locally.
  5. Output is complete. Check dataset item counts. A run that "succeeds" with zero output is worse than a crash — at least crashes are visible.

I wrote about the monitoring side of this in how I track actor reliability at scale. Testing catches bugs before publish. Monitoring catches the ones that slip through.

How Do Pre-Push Hooks Prevent Bad Deploys?

Pre-push hooks are scripts that run automatically before apify push, blocking the deploy if any validation fails. They're your last line of defense — the automated check that catches what you forgot to test manually.

Here's a minimal version:

// pre-push-check.js
import { readFileSync, existsSync } from 'fs';

let hasErrors = false;

// Schema exists and parses
const schema = JSON.parse(readFileSync('.actor/input_schema.json', 'utf8'));

// Required fields have defaults
for (const field of (schema.required || [])) {
    const prop = schema.properties?.[field];
    if (!prop?.default && !prop?.prefill) {
        console.error(`FAIL: Required field "${field}" has no default`);
        hasErrors = true;
    }
}

// actor.json exists
if (!existsSync('.actor/actor.json')) {
    console.error('FAIL: .actor/actor.json not found');
    hasErrors = true;
}

if (hasErrors) process.exit(1);
console.log('All pre-push checks passed.');

Wire it into package.json:

{
    "scripts": {
        "start": "node src/main.js",
        "test": "jest",
        "validate": "node pre-push-check.js",
        "predeploy": "npm run validate && npm test"
    }
}

This isn't fancy. It doesn't need to be. It just needs to run every time, and block deploys when something's wrong. Across the ApifyForge portfolio, pre-push hooks catch roughly 12% of issues that would have otherwise shipped — based on our internal logs from the last 6 months.

Debugging Failures: The Common Patterns

When a test fails — and it will — here's the diagnostic shortcut:

Exit CodeWhat HappenedFix
137Out of memory (OOM kill)Increase memory tier or fix the leak
1Uncaught exceptionAdd try/catch, check the stack trace
0 + empty outputLogic error or input mismatchVerify field names match between schema and code
Build failureDependency issuePin exact versions, check for native modules
TimeoutHung network requestAdd AbortSignal.timeout(30000) to all fetches

The exit code 137 pattern is the sneakiest. Your actor "works" locally because your machine has 16GB of RAM. On the platform, it gets 256MB and gets killed by the kernel's OOM manager. Always check memory usage in cloud staging before publishing.

For deeper diagnostics, use the Apify API to pull run details:

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.run(runId).get();
console.log('Memory:', Math.round(run.stats?.memAvgBytes / 1048576), 'MB');
console.log('Duration:', Math.round(run.stats?.durationMillis / 1000), 'sec');

I covered the full failure tracking approach — including how to see failures from other users running your actors — in tracking actor failures across all users.

The Pre-Publish Checklist

Before every publish or major update:

  • Actor runs locally with default inputs (apify run)
  • Actor handles empty input {} without crashing
  • Actor handles invalid/malformed input gracefully
  • Input schema validates (required fields have defaults, types match, enums are case-correct)
  • Output structure is consistent across runs, matching your dataset schema
  • Build succeeds on Apify platform
  • Cloud test run produces expected output
  • Memory usage stays within selected tier
  • Execution time is under 60 seconds for defaults
  • PPE pricing is configured correctly

With Apify actors, I can't run this manually. The ApifyForge Test Runner and Schema Validator handle most of it. But even if you're publishing your first actor, running through this list once will save you from the maintenance flag that hits 48 hours after a careless publish.

The Honest Take

The best actors on the Apify Store aren't the ones that never break. They're the ones where the developer catches the break before anyone else notices.

I've been building actors for two years. I've published over 320. The single biggest difference between actors that earn steady revenue and actors that get flagged and forgotten? Testing. Not brilliant algorithms, not fancy features — just testing the boring stuff. Default inputs, edge cases, schema validation, cloud staging.

If you build lead generation tools like the Website Contact Scraper or Waterfall Contact Enrichment, your users depend on consistent data. If you build compliance screening actors that feed into MCP servers like the Financial Crime Screening MCP, the stakes are even higher. Bad data from a poorly-tested actor cascades through every system downstream.

Test first. Publish second. Your users — and your revenue — will thank you.

Frequently asked questions

How long should testing take before publishing an Apify actor?

For a simple scraper, the full five-level workflow takes 30-60 minutes: 10 minutes for local runs, 5 minutes for schema validation, 15 minutes for writing basic tests, 10 minutes for cloud staging, and 5 minutes to set up pre-push hooks. For complex actors with multiple data sources, allow 2-3 hours. The time investment pays back immediately by preventing maintenance flags that take days to recover from.

Do I need to write unit tests for every Apify actor?

Not for every function, but you should write 3-5 tests per actor covering the happy path, empty input, invalid input, whitespace handling, and malformed HTML. These five scenarios catch the majority of real-world failures. Focus on testing the business logic functions you separated from Actor.main(), not the Apify SDK integration layer.

What is exit code 137 and how do I fix it?

Exit code 137 means the Linux kernel's OOM (Out of Memory) manager killed your process. Your actor used more memory than the selected Apify memory tier allows. Fix it by either increasing the memory tier (256MB to 1024MB) or reducing memory usage in your code — common culprits are loading entire datasets into memory, not streaming large responses, or memory leaks from unclosed browser pages.

Can I test Apify proxy configuration locally?

No. Apify proxy routing only works on the platform. You cannot test residential vs. datacenter proxy behavior from your local machine. This is one of the primary reasons cloud staging exists — push your actor to Apify with apify push and run it there to verify proxy configuration works correctly before publishing.

How do I know if my input schema will pass Apify's automated health check?

Run the ApifyForge Schema Validator on your input_schema.json before every push. The three most common schema issues that trigger health check failures are: required fields without default values, type mismatches between default values and declared types (e.g., "10" string on an integer field), and case-sensitive enum mismatches where the default does not match the enum list exactly.

Limitations

  • Local testing cannot fully replicate the Apify platform environment. Docker memory limits, specific Node.js versions, network configurations, and proxy routing differ between local machines and Apify's infrastructure. Cloud staging is required to catch these differences.
  • The five-level workflow adds development time. For simple actors, the testing overhead is 30-60 minutes. For complex multi-source actors, it can add 2-3 hours. This time is justified by avoiding maintenance flags but is a real cost.
  • Pre-push hooks only catch static issues. Schema validation and file existence checks run without executing the actor. They cannot detect runtime failures, memory leaks, or data quality issues that only appear during actual execution.
  • Output schema validation requires known-good test data. You need representative test inputs that produce predictable output to validate schema conformance. For actors scraping external websites, the target site's content can change at any time, making deterministic testing difficult.

Last updated: March 2026

Ryan Clinton publishes Apify actors as ryanclinton and builds developer tools at ApifyForge.

Related actors mentioned in this article