How to Test Your Apify Actor Before Publishing
Here is a pattern we see constantly: a developer builds an actor, runs it once locally, sees output, pushes it to the store, and moves on. Two days later, it is under maintenance.
The gap between "it works on my machine" and "it works reliably on Apify" is wider than most developers realize. Different Node.js versions, missing environment variables, network restrictions, memory limits, input edge cases — all of these can break an actor that ran fine locally.
We manage over 250 actors on the Apify Store. Before we implemented systematic testing, we were getting 2-3 maintenance flags per week. After implementing the five-level testing workflow below, we have gone months without a single flag. This guide walks through the complete approach that catches problems before they reach users.
Level 1: Local Testing with apify run
The Apify CLI includes a local runner that simulates the Apify platform environment. This is your first line of defense.
# Initialize your actor project (if not already)
apify init
# Run with default inputs
apify run
# Run with specific inputs
apify run --input='{"url": "https://example.com", "maxResults": 10}'
What apify run does:
- Loads your
INPUT_SCHEMA.json - Reads input from
storage/key_value_stores/default/INPUT.json - Executes your actor in a simulated Apify environment
- Stores output in the local
storage/directory
What to Check After a Local Run
- Exit code — did the process exit cleanly (code 0)? A non-zero exit code means the actor crashed.
- Output — check
storage/datasets/default/for results. Are they there? Is the structure correct? Are all expected fields present? - Logs — did anything unexpected appear in the console? Warnings about deprecated APIs, uncaught promise rejections, or memory warnings?
- Key-value store — if your actor writes to the store, verify the data at
storage/key_value_stores/default/
Testing Default Inputs (The Most Important Test)
This deserves special emphasis. Apify uses your default inputs for health checks. If your actor cannot run with defaults, it will eventually get flagged.
Create a file at storage/key_value_stores/default/INPUT.json that matches your schema's default values exactly. Then run apify run with no arguments. This simulates what Apify's health check does.
{
"url": "https://example.com",
"maxResults": 5,
"outputFormat": "json"
}
Every field that has a default value in your INPUT_SCHEMA.json should appear here with that same value. This is the single most important test you can run.
Testing Edge Cases Locally
Beyond default inputs, test these scenarios that commonly cause failures in production:
# Empty input — should not crash
apify run --input='{}'
# Minimal input — only required fields
apify run --input='{"url": "https://example.com"}'
# Maximum values — stress test limits
apify run --input='{"url": "https://example.com", "maxResults": 1000}'
# Invalid URL — should handle gracefully
apify run --input='{"url": "not-a-url"}'
# Unicode input — common source of bugs
apify run --input='{"query": "cafe\u0301 rese\u00f1a"}'
Each of these should produce a clean exit (code 0). The actor does not have to produce results for every case, but it must never crash.
Level 2: Input Schema Validation
Your input schema is more than documentation — it is a contract. Errors in the schema cause confusing UI bugs, validation failures, and broken health checks.
Common Schema Issues That Pass Locally But Fail on the Platform
Missing Defaults for Required Fields
{
"properties": {
"query": {
"title": "Search Query",
"type": "string"
}
},
"required": ["query"]
}
This schema requires query but provides no default. Apify's health check cannot run your actor without it. Always add default and prefill values for required fields.
Fixed version:
{
"properties": {
"query": {
"title": "Search Query",
"type": "string",
"description": "The search term to look for",
"default": "web scraping",
"prefill": "web scraping",
"editor": "textfield"
}
},
"required": ["query"]
}
Type Mismatches
{
"properties": {
"maxResults": {
"title": "Max Results",
"type": "integer",
"default": "10"
}
}
}
The type says integer but the default is a string "10". Some environments handle this silently; others do not. Be explicit: use 10 not "10".
Invalid Enum Values
{
"properties": {
"format": {
"type": "string",
"enum": ["json", "csv", "xml"],
"default": "JSON"
}
}
}
The default "JSON" does not match any enum value (they are lowercase). Case matters.
Array Fields Without Proper Defaults
{
"properties": {
"urls": {
"title": "URLs to scrape",
"type": "array",
"editor": "stringList"
}
},
"required": ["urls"]
}
This requires urls but has no default array. Fix: add "default": ["https://example.com"] or make the field optional and handle the empty case in code.
Automated Schema Validation Script
Here is a validation script that catches all of the above issues:
import { readFileSync } from 'fs';
function validateSchema(schemaPath) {
const schema = JSON.parse(readFileSync(schemaPath, 'utf8'));
const errors = [];
const warnings = [];
// Check structure
if (!schema.properties) {
errors.push('Schema has no properties defined');
return { errors, warnings };
}
// Check required fields have defaults
const required = schema.required || [];
for (const field of required) {
const prop = schema.properties[field];
if (!prop) {
errors.push(`Required field "${field}" not in properties`);
continue;
}
if (prop.default === undefined && prop.prefill === undefined) {
errors.push(`Required field "${field}" has no default or prefill`);
}
}
// Check type consistency
for (const [key, prop] of Object.entries(schema.properties)) {
if (prop.default !== undefined) {
const actualType = Array.isArray(prop.default) ? 'array' : typeof prop.default;
const expectedType = prop.type === 'integer' ? 'number' : prop.type;
if (actualType !== expectedType) {
errors.push(
`Field "${key}": default type "${actualType}" != schema type "${prop.type}"`
);
}
}
// Check enum consistency
if (prop.enum && prop.default !== undefined) {
if (!prop.enum.includes(prop.default)) {
errors.push(
`Field "${key}": default "${prop.default}" not in enum [${prop.enum}]`
);
}
}
// Check min/max don't exclude default
if (prop.type === 'integer' || prop.type === 'number') {
if (prop.minimum !== undefined && prop.default < prop.minimum) {
errors.push(
`Field "${key}": default ${prop.default} < minimum ${prop.minimum}`
);
}
if (prop.maximum !== undefined && prop.default > prop.maximum) {
errors.push(
`Field "${key}": default ${prop.default} > maximum ${prop.maximum}`
);
}
}
// Warn about missing descriptions
if (!prop.description) {
warnings.push(`Field "${key}" has no description`);
}
}
return { errors, warnings };
}
// Usage
const result = validateSchema('INPUT_SCHEMA.json');
if (result.errors.length > 0) {
console.error('SCHEMA ERRORS:');
result.errors.forEach(e => console.error(` ERROR: ${e}`));
process.exit(1);
}
if (result.warnings.length > 0) {
result.warnings.forEach(w => console.warn(` WARN: ${w}`));
}
console.log('Schema validation passed');
The Schema Validator on ApifyForge automates this and checks additional platform-specific requirements that are hard to validate manually.
Level 3: Automated Test Suites
Local manual testing catches obvious bugs. Automated tests catch regressions and edge cases. When you manage many actors, automated tests are the only way to maintain quality across the portfolio.
Structuring Testable Actor Code
The most important testing principle for Apify actors: extract your logic into testable functions. Do not put everything inside Actor.main(). Pull out the scraping, processing, and output logic so you can test them independently.
Hard to test:
await Actor.main(async () => {
const input = await Actor.getInput();
const response = await fetch(input.url);
const html = await response.text();
const $ = cheerio.load(html);
const results = [];
$('h2').each((i, el) => {
results.push({ title: $(el).text(), index: i });
});
await Actor.pushData(results);
});
Easy to test:
// scraper.js — pure logic, no Actor dependency
import * as cheerio from 'cheerio';
export function parseHeadings(html) {
const $ = cheerio.load(html);
const results = [];
$('h2').each((i, el) => {
results.push({ title: $(el).text().trim(), index: i });
});
return results;
}
export async function fetchPage(url) {
const response = await fetch(url, {
signal: AbortSignal.timeout(30000)
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}`);
}
return response.text();
}
// main.js — thin orchestration layer
import { Actor, log } from 'apify';
import { fetchPage, parseHeadings } from './scraper.js';
await Actor.main(async () => {
const input = await Actor.getInput() || {};
const url = input.url || 'https://example.com';
try {
const html = await fetchPage(url);
const results = parseHeadings(html);
await Actor.pushData(results);
log.info(`Found ${results.length} headings`);
} catch (error) {
log.error(`Failed: ${error.message}`);
}
});
Writing the Test Suite
Here is a comprehensive test structure using Jest:
import { parseHeadings, fetchPage } from './scraper.js';
describe('parseHeadings', () => {
test('extracts h2 elements from valid HTML', () => {
const html = '<html><body><h2>First</h2><h2>Second</h2></body></html>';
const results = parseHeadings(html);
expect(results).toHaveLength(2);
expect(results[0]).toEqual({ title: 'First', index: 0 });
expect(results[1]).toEqual({ title: 'Second', index: 1 });
});
test('returns empty array for HTML without h2 elements', () => {
const html = '<html><body><h1>Only h1</h1><p>text</p></body></html>';
const results = parseHeadings(html);
expect(results).toEqual([]);
});
test('handles empty HTML', () => {
const results = parseHeadings('');
expect(results).toEqual([]);
});
test('trims whitespace from heading text', () => {
const html = '<h2> Spaced Out </h2>';
const results = parseHeadings(html);
expect(results[0].title).toBe('Spaced Out');
});
test('handles Unicode characters', () => {
const html = '<h2>Cafe Resena</h2>';
const results = parseHeadings(html);
expect(results[0].title).toBe('Cafe Resena');
});
test('handles malformed HTML gracefully', () => {
const html = '<h2>Unclosed heading<h2>Another one</h2>';
const results = parseHeadings(html);
// Should not throw — Cheerio handles malformed HTML
expect(Array.isArray(results)).toBe(true);
});
});
describe('fetchPage', () => {
test('throws on invalid URL', async () => {
await expect(fetchPage('not-a-url'))
.rejects.toThrow();
});
test('throws on HTTP errors', async () => {
await expect(fetchPage('https://httpstat.us/404'))
.rejects.toThrow('HTTP 404');
});
test('fetches valid pages', async () => {
const html = await fetchPage('https://example.com');
expect(html).toContain('Example Domain');
});
});
Key Testing Patterns for Actors
- Test edge cases aggressively. Empty inputs, huge inputs, malformed URLs, non-English characters, rate limit responses. These are where production failures hide.
- Test output structure. Users depend on consistent output shapes. If a field disappears, their downstream integrations break. Write a test that validates every expected field exists and has the correct type.
- Test with realistic data. Do not just test with
example.com. Use real-world URLs that represent your typical use case. - Mock external dependencies for unit tests. Use real external calls only for integration tests. This keeps your test suite fast and reliable.
Output Schema Validation
If your actor promises a specific output format, enforce it in tests:
const EXPECTED_OUTPUT_SCHEMA = {
required: ['title', 'url', 'timestamp'],
properties: {
title: { type: 'string' },
url: { type: 'string' },
timestamp: { type: 'string' },
description: { type: 'string' }, // optional
}
};
function validateOutput(items) {
const errors = [];
for (let i = 0; i < items.length; i++) {
for (const field of EXPECTED_OUTPUT_SCHEMA.required) {
if (items[i][field] === undefined) {
errors.push(`Item ${i}: missing required field "${field}"`);
}
}
for (const [field, spec] of Object.entries(EXPECTED_OUTPUT_SCHEMA.properties)) {
if (items[i][field] !== undefined && typeof items[i][field] !== spec.type) {
errors.push(`Item ${i}: field "${field}" expected ${spec.type}, got ${typeof items[i][field]}`);
}
}
}
return errors;
}
Level 4: Cloud Staging
Local testing cannot catch everything. The Apify platform has specific behaviors around memory management, Docker container lifecycle, proxy routing, and storage that differ from your local environment.
Before publishing or pushing a major update:
# Push to Apify (this creates a new build)
apify push
Then trigger a test run via the API or Console and verify:
-
Build succeeds — your Dockerfile and dependencies resolve correctly on the platform. Common failure: a dependency that installs fine on your machine but fails in the Docker container (native modules, platform-specific packages).
-
Memory usage — check the run details for memory consumption. Local runs do not enforce memory limits; Apify does. If your actor uses 512MB locally but you selected 256MB memory, it will crash.
-
Execution time — is it within reasonable bounds? Long runs cost users money. We aim for under 60 seconds for most actors with default inputs.
-
Proxy behavior — if your actor uses Apify proxies, verify they work as expected (residential vs. datacenter, geo-targeting). Proxy behavior cannot be tested locally.
-
Storage — are datasets, key-value stores, and request queues created and populated correctly? Check that output counts match expectations.
Memory Profiling Tip
Add memory logging to catch leaks before they cause platform crashes:
function logMemory(label) {
const usage = process.memoryUsage();
log.info(`Memory [${label}]: RSS=${Math.round(usage.rss / 1048576)}MB, Heap=${Math.round(usage.heapUsed / 1048576)}MB`);
}
// Call at key points in your actor
logMemory('start');
// ... process data ...
logMemory('after-processing');
// ... push results ...
logMemory('after-push');
If RSS grows continuously without leveling off, you have a memory leak. Fix it before publishing — a memory-related crash on the platform is one of the hardest bugs for users to diagnose.
Level 5: Pre-Push Hooks
The final layer of defense: prevent bad code from reaching the platform in the first place.
Create a pre-push validation script that runs automatically:
// pre-push-check.js
import { readFileSync, existsSync } from 'fs';
const checks = [];
let hasErrors = false;
// 1. Validate INPUT_SCHEMA.json exists and parses
if (!existsSync('INPUT_SCHEMA.json')) {
checks.push({ status: 'FAIL', message: 'INPUT_SCHEMA.json not found' });
hasErrors = true;
} else {
try {
const schema = JSON.parse(readFileSync('INPUT_SCHEMA.json', 'utf8'));
// Check required fields have defaults
const required = schema.required || [];
for (const field of required) {
const prop = schema.properties?.[field];
if (!prop?.default && !prop?.prefill) {
checks.push({
status: 'FAIL',
message: `Required field "${field}" has no default value`
});
hasErrors = true;
}
}
if (!hasErrors) {
checks.push({ status: 'PASS', message: 'Schema validation' });
}
} catch (e) {
checks.push({ status: 'FAIL', message: `Schema parse error: ${e.message}` });
hasErrors = true;
}
}
// 2. Check package.json has start script
try {
const pkg = JSON.parse(readFileSync('package.json', 'utf8'));
if (!pkg.scripts?.start) {
checks.push({ status: 'FAIL', message: 'No start script in package.json' });
hasErrors = true;
} else {
checks.push({ status: 'PASS', message: 'Start script exists' });
}
} catch {
checks.push({ status: 'FAIL', message: 'Cannot read package.json' });
hasErrors = true;
}
// 3. Check .actor/actor.json exists
if (!existsSync('.actor/actor.json')) {
checks.push({ status: 'FAIL', message: '.actor/actor.json not found' });
hasErrors = true;
} else {
checks.push({ status: 'PASS', message: 'Actor config exists' });
}
// 4. Check README exists and has content
if (!existsSync('README.md')) {
checks.push({ status: 'WARN', message: 'No README.md found' });
} else {
const readme = readFileSync('README.md', 'utf8');
if (readme.length < 200) {
checks.push({ status: 'WARN', message: 'README.md is very short (< 200 chars)' });
} else {
checks.push({ status: 'PASS', message: 'README.md exists with content' });
}
}
// Report
console.log('\nPre-push validation:');
for (const check of checks) {
const icon = check.status === 'PASS' ? 'OK' : check.status === 'WARN' ? '!!' : 'XX';
console.log(` [${icon}] ${check.message}`);
}
if (hasErrors) {
console.error('\nPre-push checks FAILED. Fix errors before pushing.');
process.exit(1);
}
console.log('\nAll checks passed.');
Wire this into your workflow by adding it to your package.json:
{
"scripts": {
"start": "node src/main.js",
"test": "jest",
"validate": "node pre-push-check.js",
"predeploy": "npm run validate && npm test"
}
}
The Complete Testing Checklist
Before every publish or major update, verify every item:
- [ ] Actor runs locally with default inputs (
apify run) - [ ] Actor runs with empty input
{}without crashing - [ ] Actor runs with minimal required input
- [ ] Actor handles invalid/malformed input gracefully
- [ ]
INPUT_SCHEMA.jsonvalidates (all required fields have defaults, types match, enums are correct) - [ ] Output structure is consistent across multiple runs
- [ ] All expected output fields are present and correctly typed
- [ ] Edge cases tested (empty results, Unicode, large inputs)
- [ ] Build succeeds on Apify platform
- [ ] Cloud test run produces output
- [ ] Memory usage is within selected tier
- [ ] Execution time is reasonable for default inputs
- [ ] README matches current functionality
- [ ] PPE pricing is configured correctly
How We Run This at Scale
With 250+ actors, we cannot run this checklist manually for each one. We have an automated test pipeline that runs the schema validation and default-input test for every actor after deployment. Actors that fail get flagged immediately, and we fix them before Apify's health checker notices.
ApifyForge bundles the Schema Validator and Test Runner into a streamlined workflow that handles most of this checklist automatically. But even if you roll your own tooling, the important thing is that you test systematically — not just "run it once and hope."
Debugging Failed Runs
When a test does fail, here is how to diagnose the issue quickly:
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
async function debugFailedRun(runId) {
const run = await client.run(runId).get();
const runLog = await client.run(runId).log().get();
console.log('Run status:', run.status);
console.log('Exit code:', run.exitCode);
console.log('Memory used:', Math.round(run.stats?.memAvgBytes / 1048576), 'MB');
console.log('Duration:', Math.round(run.stats?.durationMillis / 1000), 'sec');
console.log('\nLast 50 lines of log:');
console.log(runLog.split('\n').slice(-50).join('\n'));
}
Common failure patterns and their fixes:
| Symptom | Likely Cause | Fix | |---|---|---| | Exit code 137 | Out of memory (OOM kill) | Increase memory tier or optimize memory usage | | Exit code 1 + uncaught exception | Unhandled error in code | Add try/catch around the failing section | | Timeout with no output | Hung network request | Add timeouts to all fetch/request calls | | Build fails | Dependency issue | Check package.json versions, try pinning exact versions | | Empty output, success status | Logic error or input mismatch | Check that input field names match between schema and code |
The best actors on the Apify Store are not the ones that never break. They are the ones whose developers catch the break before anyone else notices. Invest in testing now, and you will save yourself from maintenance flags, bad reviews, and lost revenue later.
Related resources:
- Test Runner — automated test suites for actors
- Cloud Staging — test in production before publishing
- Regression Suite — detect regressions between builds
- Compare SEO Tools — see how tested actors compare on reliability