Managing a Large Apify Actor Portfolio — Lessons from 250+ Actors
When you have five actors on the Apify Store, management is easy. You remember what each one does, you check them occasionally, and life is good.
When you have 250+, everything changes. Actors break silently. Input schemas drift out of sync with code. Maintenance flags appear on actors you forgot existed. Pricing gets inconsistent. README files become stale. And deploying updates becomes a full-time job.
We have been managing a portfolio of over 250 actors for the past year. Here is what we learned — the hard way — about keeping a large actor fleet healthy, profitable, and growing.
The Breaking Point: When Manual Management Dies
Around the 50-actor mark, we hit the first wall. Manual management was no longer viable. We were spending more time maintaining existing actors than building new ones. Specific problems:
- No visibility into fleet health. Which actors were failing? Which ones had not been run in weeks? We had to check each one individually in the Apify Console.
- Inconsistent metadata. Some actors had great SEO descriptions, others had placeholder text. Categories were wrong. Icons were missing.
- No bulk operations. Changing the PPE pricing on 30 actors meant clicking through 30 settings pages.
- No deployment pipeline. Every push was manual. Testing before push was optional (and usually skipped).
- No revenue tracking. We had no idea which actors were making money and which were costing us compute with zero return.
Sound familiar? If you manage even 10+ actors, you have probably hit some of these.
The Numbers That Forced Us to Automate
Here is what our manual management looked like at 100 actors:
| Task | Time Per Actor | Total Weekly Time | |---|---|---| | Health check (review runs) | 3 min | 5 hours | | README updates | 15 min | 25 hours (monthly) | | Pricing review | 5 min | 8 hours (monthly) | | Deployment | 10 min | varies | | Bug investigation | 20 min | varies |
We were spending 20+ hours per week just on maintenance. That is a full half-time job that produces zero new actors and zero new revenue. Something had to change.
What We Automated First
Health Monitoring
The single highest-value automation we built was a health check script. It pulls the status of every actor via the Apify API and flags anything that is broken, under maintenance, or has not been run recently.
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
async function checkFleetHealth() {
const actors = await client.actors().list({ my: true });
const report = {
total: actors.items.length,
healthy: [],
warning: [],
critical: [],
stale: [],
};
for (const actor of actors.items) {
const builds = await client.actor(actor.id).builds().list({ limit: 1 });
const lastBuild = builds.items[0];
const runs = await client.actor(actor.id).runs().list({ limit: 10 });
const recentRuns = runs.items || [];
const failures = recentRuns.filter(r => r.status === 'FAILED').length;
const successRate = recentRuns.length > 0
? Math.round(((recentRuns.length - failures) / recentRuns.length) * 100)
: null;
const daysSinceLastBuild = lastBuild?.finishedAt
? Math.floor((Date.now() - new Date(lastBuild.finishedAt)) / 86400000)
: Infinity;
const daysSinceLastRun = recentRuns[0]?.startedAt
? Math.floor((Date.now() - new Date(recentRuns[0].startedAt)) / 86400000)
: Infinity;
const entry = {
name: actor.name,
id: actor.id,
successRate,
failures,
daysSinceLastBuild,
daysSinceLastRun
};
if (failures >= 5) {
report.critical.push(entry);
} else if (daysSinceLastRun > 30) {
report.stale.push(entry);
} else if (failures >= 2 || successRate < 90) {
report.warning.push(entry);
} else {
report.healthy.push(entry);
}
}
const healthScore = Math.round(
(report.healthy.length / report.total) * 100
);
console.log(`Fleet Health: ${healthScore}%`);
console.log(` Healthy: ${report.healthy.length}`);
console.log(` Warning: ${report.warning.length}`);
console.log(` Critical: ${report.critical.length}`);
console.log(` Stale: ${report.stale.length}`);
return report;
}
We run this daily. It catches problems before Apify's maintenance system does, giving us time to fix things proactively. Read more about our monitoring approach.
Bulk Metadata Updates
When you need to update the category, SEO description, or pricing on dozens of actors, doing it through the UI is painful. We wrote scripts to update actor metadata via the API:
// Set PPE pricing across multiple actors
const actorConfigs = [
{ id: 'abc123', event: 'result-scraped', price: 0.10 },
{ id: 'def456', event: 'page-scraped', price: 0.05 },
{ id: 'ghi789', event: 'lead-enriched', price: 0.15 },
// ... 50 more actors
];
for (const config of actorConfigs) {
await client.actor(config.id).update({
pricingInfos: [{
pricingModel: 'PAY_PER_EVENT',
pricingPerEvent: {
actorChargeEvents: {
[config.event]: {
eventTitle: config.event.replace(/-/g, ' '),
eventPriceUsd: config.price,
}
}
}
}]
});
console.log(`Updated pricing for ${config.id}: $${config.price}/${config.event}`);
}
This turned a full-day task into a 30-second script run. We used similar approaches for setting categories, updating descriptions, and managing publication status.
Bulk Category and Description Updates
// Update categories and SEO descriptions in bulk
const metadataUpdates = [
{
id: 'abc123',
categories: ['LEAD_GENERATION', 'AI'],
seoTitle: 'Website Contact Scraper — Extract Emails & Phone Numbers',
seoDescription: 'Scrape contact information from any website...'
},
// ... more actors
];
for (const update of metadataUpdates) {
await client.actor(update.id).update({
categories: update.categories,
seoTitle: update.seoTitle,
seoDescription: update.seoDescription,
});
console.log(`Updated metadata for ${update.id}`);
}
Deployment Pipeline
Early on, we pushed actors manually with apify push. That works fine for one actor. For 250, we needed something better.
Our deployment flow:
- Pre-push validation — schema check, default input test, lint
- Push to Apify — automated via script
- Post-push verification — trigger a test run, verify it succeeds
- Metadata sync — ensure store listing matches the latest code
async function deployAndVerify(actorId, defaultInput) {
// Step 1: Trigger a test run with default inputs after push
const run = await client.actor(actorId).call(defaultInput, {
memory: 256,
timeout: 60
});
// Step 2: Verify run succeeded
if (run.status === 'SUCCEEDED') {
console.log(`Deploy verified: ${actorId}`);
// Step 3: Check output exists
const dataset = await client.dataset(run.defaultDatasetId).listItems();
if (dataset.items.length > 0) {
console.log(` Output: ${dataset.items.length} items`);
return { success: true, items: dataset.items.length };
} else {
console.warn(` Warning: Run succeeded but produced no output`);
return { success: true, items: 0 };
}
} else {
console.error(`Deploy FAILED: ${actorId} — status: ${run.status}`);
return { success: false, status: run.status };
}
}
The key insight: treat actors like microservices. Each one has its own directory, its own tests, and its own deployment config. The pipeline handles them uniformly.
The Five Biggest Pitfalls at Scale
1. Stale Default Inputs
Your actor's default input is its first impression and its health check target. When you update your code but forget to update the default input, things break. At scale, this is the number one cause of maintenance flags.
Solution: Store default inputs as JSON fixtures alongside your code. Validate them against the schema on every push. If the schema changes, the fixture must change too.
my-actor/
src/
main.js
INPUT_SCHEMA.json
test-inputs/
default.json # Must match schema defaults exactly
edge-case-empty.json # Empty object {}
edge-case-large.json # Max values for all fields
edge-case-unicode.json # Non-ASCII characters
.actor/
actor.json
package.json
README.md
2. Inconsistent Error Handling
When you have actors built over months (or by different team members), error handling varies wildly. Some actors crash on bad input. Others swallow errors silently. Others log errors but still exit with status "SUCCEEDED," confusing users.
Solution: Establish a standard template early. We standardized on a pattern that handles input validation, network errors, and empty results consistently. Every new actor starts from this template:
import { Actor, log } from 'apify';
await Actor.main(async () => {
// 1. Input validation with defaults
const input = await Actor.getInput() || {};
const url = input.url || 'https://example.com';
const maxResults = Math.min(input.maxResults || 100, 1000);
// 2. Status reporting
await Actor.setStatusMessage('Starting...');
// 3. Core logic wrapped in try/catch
try {
const results = await scrape(url, maxResults);
await Actor.pushData(results);
await Actor.setStatusMessage(`Done: ${results.length} results`);
log.info(`Completed successfully with ${results.length} results`);
} catch (error) {
log.error(`Failed: ${error.message}`);
await Actor.pushData([{
error: error.message,
url,
timestamp: new Date().toISOString()
}]);
await Actor.setStatusMessage(`Error: ${error.message}`);
}
});
3. README Drift
Your README is your sales page on the Apify Store. When your actor evolves but the README does not, users get confused and your conversion rate drops. At 250 actors, README maintenance is a real burden.
Solution: Generate READMEs from structured data where possible. Your input schema, pricing config, and example outputs can all feed into a template:
function generateInputDocs(schema) {
let md = '## Input Parameters\n\n';
md += '| Parameter | Type | Required | Default | Description |\n';
md += '|---|---|---|---|---|\n';
for (const [key, prop] of Object.entries(schema.properties || {})) {
const required = (schema.required || []).includes(key) ? 'Yes' : 'No';
const defaultVal = prop.default !== undefined
? `\`${JSON.stringify(prop.default)}\``
: '-';
md += `| ${prop.title || key} | ${prop.type} | ${required} | ${defaultVal} | ${prop.description || '-'} |\n`;
}
return md;
}
We still write the introductory sections by hand (that is where SEO optimization matters most), but the input documentation, pricing tables, and examples are generated. This ensures they never drift from the actual code.
4. The Publishing Rate Limit
Apify limits how many actors you can make public per day (currently around 5). If you have built a batch of 20 actors, it takes four days to publish them all. We did not plan for this initially and ended up with actors sitting unpublished for weeks.
Solution: Script your publishing and run it daily. Our publish script checks which actors are still private, publishes up to the daily limit, and tracks progress:
async function publishBatch(limit = 5) {
const actors = await client.actors().list({ my: true });
const privateActors = actors.items.filter(a => !a.isPublic);
console.log(`Found ${privateActors.length} unpublished actors`);
let published = 0;
for (const actor of privateActors) {
if (published >= limit) {
console.log(`Reached daily limit of ${limit}`);
break;
}
try {
await client.actor(actor.id).update({ isPublic: true });
console.log(`Published: ${actor.name}`);
published++;
} catch (error) {
console.error(`Failed to publish ${actor.name}: ${error.message}`);
}
}
console.log(`Published ${published}. ${privateActors.length - published} remaining.`);
}
Set it and forget it. Run it daily and your backlog clears itself.
5. Not Tracking Revenue Per Actor
When all your actors share a single Apify account, it is hard to tell which ones are making money and which ones are dead weight. Without this data, you cannot make good decisions about where to invest your time.
Solution: Pull run and revenue data from the Apify API regularly. Track cost-to-serve (compute consumed) vs. revenue (PPE income) per actor:
async function getActorROI(actorId, daysBack = 30) {
const runs = await client.actor(actorId).runs().list({
limit: 1000,
desc: true
});
const since = new Date(Date.now() - daysBack * 86400000);
const recentRuns = runs.items.filter(r =>
new Date(r.startedAt) > since
);
let totalComputeUsd = 0;
let totalChargedEvents = 0;
for (const run of recentRuns) {
totalComputeUsd += run.usageTotalUsd || 0;
if (run.chargedEventCounts) {
for (const count of Object.values(run.chargedEventCounts)) {
totalChargedEvents += count;
}
}
}
return {
actorId,
runs: recentRuns.length,
computeCostUsd: totalComputeUsd.toFixed(4),
chargedEvents: totalChargedEvents,
};
}
This data drives quarterly portfolio reviews where we decide which actors to invest in, which to deprecate, and which to reprice. Actors that cost more in compute than they earn in PPE either get optimized or retired.
Portfolio Organization: Directory Structure
At 250+ actors, physical file organization matters enormously. Here is the structure we settled on:
project-root/
actors/ # Standalone actors (scrapers, enrichers)
website-contact-scraper/
src/
INPUT_SCHEMA.json
.actor/actor.json
package.json
README.md
email-pattern-finder/
...
mcps/ # MCP intelligence servers
seo-audit-mcp/
...
fleet-analytics-mcp/
...
wrappers/ # Legacy thin wrappers (no longer building new ones)
...
scripts/ # Automation scripts
check-health.js
publish-batch.js
set-pricing.js
update-metadata.js
docs/ # Documentation
ACTOR-REGISTRY.md
NEXT-MCP-BUILDS.md
Each actor directory is self-contained. You can push any individual actor without affecting others. Scripts operate across all actor directories uniformly.
Naming Conventions That Scale
At 5 actors, naming does not matter. At 250, it is critical for your sanity. Our conventions:
- Actor names:
[target]-[action]-[qualifier](e.g.,website-contact-scraper,google-maps-email-extractor) - MCP names:
[domain]-[function]-mcp(e.g.,seo-audit-intelligence-mcp) - Event names:
[noun]-[past-tense-verb](e.g.,result-scraped,lead-enriched,report-generated) - Input fields: camelCase, descriptive (e.g.,
maxResults,outputFormat,includeMetadata)
Consistent naming makes searching trivial and reduces cognitive load when switching between actors.
Scaling Lessons: What We Would Tell Our Past Selves
-
Automate monitoring before you automate building. Knowing what is broken is more valuable than shipping faster. Start with the health check script on day one.
-
Standardize early. Templates, naming conventions, directory structure — lock these down before actor 20, not actor 200. Retrofitting 200 actors to match a new convention is painful.
-
Track everything. Build counts, failure rates, revenue per actor, time since last update. Data drives good decisions. We review these metrics monthly and make portfolio decisions based on them.
-
Plan for rate limits. Publishing, API calls, build queues — Apify has limits on all of them. Build your automation to work within them gracefully. Adding a brief delay between API calls saves you from 429 errors.
-
Treat your portfolio as a product. Individual actors are features. The portfolio is the product. Manage it accordingly — with roadmaps, prioritization, and regular health reviews.
-
Kill actors that do not perform. We retired 30+ actors that were getting zero runs and costing compute on health checks. A smaller, healthier portfolio outperforms a large, neglected one.
-
Invest in tooling early. Every hour you spend building automation saves 10 hours of manual work over the next month. The ROI is immediate and compounds over time.
The Tools We Built (and Now Share)
When we hit these problems, we built scripts. Lots of scripts. Shell scripts, Node scripts, Python scripts. They worked, but they were fragile, poorly documented, and hard for new team members to use.
That frustration is exactly why we built ApifyForge.
The Fleet Analytics dashboard gives you the bird's-eye view we always wanted — actor health, failure rates, revenue, and staleness across your entire portfolio in one place. The Schema Validator catches input schema issues before they cause maintenance flags. The Test Runner automates pre-push validation. The SEO Auditor ensures your store listings are optimized for discovery.
Where to Start
If you are just hitting the scaling wall, here is the priority order:
- Week 1: Set up automated health monitoring. Even a simple script that runs daily and emails you a report is transformative.
- Week 2: Standardize your actor template. Make sure every new actor follows the same pattern for error handling, input validation, and output structure.
- Week 3: Build bulk metadata scripts. Start with pricing since it directly impacts revenue.
- Week 4: Implement a deployment pipeline with pre-push validation and post-push verification.
Managing 250+ actors is not glamorous work. But the developers who do it well — who keep their actors healthy, their metadata fresh, and their pricing optimized — are the ones making real money on the Apify Store.
Related resources:
- Actor Portfolio Management — full use-case guide
- Cost Calculator — estimate and optimise PPE costs
- Compare Lead Generation Actors — benchmark your actors against competitors
- Compliance Scanner — check scraping compliance for your actors