Schema Registry
Central registry for actor output schemas. Store, version, and share dataset schemas across your actor portfolio. Detect schema drift between builds.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| schema-registered | Charged per schema registration. | $0.25 |
Example: 100 events = $25.00 · 1,000 events = $250.00
Documentation
Actor output schema registry for Apify lets you search and browse every dataset field across your entire actor fleet in one run. Instead of opening each actor's build page to check what it outputs, this tool builds a cross-fleet index and answers "which of my actors produce an email field?" in seconds.
Built for Apify developers who manage multiple actors, this tool scans up to 500 actors, indexes every field from each actor's dataset schema definition, and returns either a full registry or targeted search results. It requires no browser, no proxies, and no manual clicking — just a single API-driven scan of your account's latest builds.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| 📊 Total actors scanned | Account API scan | 52 actors in account |
| ✅ Actors with schemas | Build schema detection | 34 actors have dataset definitions |
| 🔢 Unique field count | Cross-fleet field index | 412 unique field names indexed |
| 🔍 Search match count | Field name substring match | 6 matches for "email" |
| 📋 Field name | Actor build definition | email, contactEmail, emailAddress |
| 🏷️ Field type | Schema type annotation | string, number, boolean, array |
| 📝 Field description | Schema description text | "Primary contact email address" |
| 🎭 Actor full name | Account actor list | ryanclinton/website-contact-scraper |
| 🔑 Actor ID | Account actor list | abc123XYZ789 |
| 📁 Category filter applied | Input parameter | LEAD_GENERATION |
| 🕐 Registry timestamp | Run completion | 2026-03-20T14:32:00.000Z |
Why use Actor Schema Registry?
Managing more than a handful of actors means losing track of what each one outputs. Finding a specific field means opening the Apify console, navigating to each actor, locating the build, and checking the schema definition — manually, one actor at a time. With 20+ actors, that process takes 30-45 minutes and you still might miss variants like contactEmail vs emailAddress.
This actor automates the entire process: one run scans your full fleet, builds a searchable index, and tells you exactly which actors output the field you need — along with the field type and actor ID for immediate API use.
- Scheduling — run weekly to keep your schema index fresh as you build and update actors
- API access — trigger registry scans from Python, JavaScript, or any HTTP client to power internal tooling
- Monitoring — get Slack or email alerts when runs fail or the schema count changes unexpectedly
- Integrations — pipe registry output into Google Sheets, Notion, or webhooks to maintain living documentation
- Low cost — $0.20 per scan covers your entire fleet with no per-actor charges
Features
- Fleet-wide schema indexing — calls the Apify Builds API for every actor in your account, reads
actorDefinition.storages.dataset.fieldsfrom each latest build, and assembles a unified index in a single run - Field name search — case-insensitive substring matching across all indexed field names, so searching
emailmatchesemail,contactEmail,emailAddress, and any other variant - Sorted search results — matching fields ranked by the number of actors that contain them, so your most-used fields appear first
- Full browse mode — omit
searchFieldto get a complete per-actor registry showing every field name, type, and description for each actor that has a dataset schema defined - Category filtering — supply an Apify Store category string (e.g.,
LEAD_GENERATION,SEO_TOOLS) to narrow the scan to a subset of actors before indexing - Type reporting — extracts the JSON Schema type annotation for each field:
string,number,boolean,array,object, orunknownwhen the type is not declared - Timeout protection — all API calls use
AbortSignal.timeout(30000), so a slow or unresponsive build endpoint never hangs the run indefinitely - Error resilience — actors with missing builds, failed build fetches, or no dataset schema are silently skipped rather than causing run failure
- Minimal footprint — runs in 128MB memory, makes only Apify API calls (no external HTTP), and produces a single compact report object
- Pay-per-event pricing — charges only on successful schema-search completion; no charge if the run errors before producing output
Use cases for actor output schema registry
Actor pipeline design
When you need to chain actors together — for example, feeding one actor's output into another's input — you need to know the exact field names both actors use. The schema registry answers this immediately: search for the output field name from actor A, confirm it matches the expected input field of actor B, and build your pipeline with confidence. Combine with Pipeline Builder to automate the entire workflow design step.
Schema documentation and auditing
Growing actor portfolios quickly become undocumented. This actor generates a structured inventory of every dataset field across your fleet, usable as living documentation for your team. Run it weekly on a schedule and push the output to a Google Sheet to maintain an always-current field reference without any manual work.
Data integration planning
Before connecting an actor's output to a downstream system — HubSpot, a data warehouse, a Zapier workflow — you need to know the exact field names and types the actor produces. The schema registry gives you this in a structured JSON format you can parse programmatically rather than reading through each actor's README.
Actor quality audit
Use the full browse mode to spot actors in your fleet that have no dataset schema defined. Actors without schemas produce undocumented output that is harder to integrate and harder to validate. The actorsWithSchema vs totalActors gap in the report immediately surfaces which actors need schema annotations added. Pair with Schema Validator to enforce field contracts on those actors.
Finding overlapping outputs across actors
When multiple actors in your fleet potentially produce similar data (e.g., several lead generation actors all extracting email addresses), searching for email shows every actor that outputs that field, the exact field name variant each uses, and the actor IDs. This is the fastest way to identify redundancy, inconsistent naming, or opportunities to standardize output fields across your portfolio.
Developer onboarding
When a new developer joins your team and needs to understand which actors produce which data, the schema registry gives them a complete field inventory in one run instead of requiring them to read through dozens of READMEs and build pages.
How to search actor output schemas
- Go to the actor input panel — navigate to the Actor Schema Registry page on Apify and open the input panel. No configuration is required to get started.
- Enter a field name to search — type a field name like
email,price, orratingin the Search Field box. Leave it blank to get the full registry of all actors and their fields. Optionally add a category likeLEAD_GENERATIONto narrow the scan. - Click Start and wait — the actor scans all actors in your account, fetches their latest build schemas, and builds the index. A typical run covering 50 actors completes in under 60 seconds.
- Download results — open the Dataset tab, download as JSON or CSV, or read the single report object directly from the run output.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
searchField | string | No | (none) | Field name to search for across all actor dataset schemas. Case-insensitive substring match — email matches email, emailAddress, contactEmail. Omit to get the full registry. |
category | string | No | (none) | Apify Store category to filter actors before scanning. Examples: LEAD_GENERATION, SEO_TOOLS, SOCIAL_MEDIA. Omit to scan all actors. |
Input examples
Search for all actors that output an email field:
{
"searchField": "email"
}
Browse the full schema registry for lead generation actors only:
{
"category": "LEAD_GENERATION"
}
Search within a specific category:
{
"searchField": "price",
"category": "ECOMMERCE"
}
Input tips
- Omit both fields for a full inventory — running with no inputs returns the complete registry of every actor with a dataset schema, which is the best starting point for documentation or auditing.
- Use short substrings for broader matches — searching
urlwill matchurl,pageUrl,sourceUrl,profileUrl, giving you the full picture of URL-type fields across your fleet. - Use category filtering on large accounts — if you have 200+ actors, adding a category cuts the number of build fetches and speeds up the run noticeably.
- Pipe results into Schema Diff — once you find two actors that both output an
emailfield, pass their IDs to Schema Diff to see how their complete schemas compare.
Output example
{
"totalActors": 52,
"actorsWithSchema": 34,
"totalFields": 412,
"searchField": "email",
"searchResults": [
{
"field": "email",
"type": "string",
"foundIn": [
{ "actor": "ryanclinton/website-contact-scraper", "actorId": "tF3mNxKpWqR8vBzL" },
{ "actor": "ryanclinton/google-maps-email-extractor", "actorId": "gM9cYjSdR2xKpNwV" },
{ "actor": "ryanclinton/event-lead-extractor", "actorId": "hQ5rXbLnD4tPmCwZ" },
{ "actor": "ryanclinton/b2b-lead-qualifier", "actorId": "kR7sZdMqN3yJvBxT" }
]
},
{
"field": "emailAddress",
"type": "string",
"foundIn": [
{ "actor": "ryanclinton/waterfall-contact-enrichment", "actorId": "pN2wVxLkR6cDtYqB" },
{ "actor": "ryanclinton/email-pattern-finder", "actorId": "mJ4cBsXrT8nKpLwV" }
]
},
{
"field": "contactEmail",
"type": "string",
"foundIn": [
{ "actor": "ryanclinton/company-deep-research", "actorId": "vL6nTrKxB9mQsYdP" }
]
}
],
"registry": null,
"registryAt": "2026-03-20T14:32:17.841Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
totalActors | number | Count of actors scanned (after category filter applied) |
actorsWithSchema | number | Count of actors that had a readable dataset schema in their latest build |
totalFields | number | Count of unique field names found across all schemas |
searchField | string or null | The search term supplied in input, or null if browse mode was used |
searchResults | array or undefined | Present when searchField was provided. Array of matching field objects sorted by occurrence count descending. |
searchResults[].field | string | Exact field name as declared in the actor's schema |
searchResults[].type | string | JSON Schema type of the field (string, number, boolean, array, object, unknown) |
searchResults[].foundIn | array | List of actors that contain this field |
searchResults[].foundIn[].actor | string | Full actor name in username/actorname format |
searchResults[].foundIn[].actorId | string | Apify actor ID, usable directly in API calls |
registry | array or undefined | Present when no searchField was provided. Per-actor listing of all fields. |
registry[].actorName | string | Full actor name in username/actorname format |
registry[].actorId | string | Apify actor ID |
registry[].fields | array | All dataset fields declared in the actor's schema |
registry[].fields[].name | string | Field name |
registry[].fields[].type | string | JSON Schema type |
registry[].fields[].description | string or undefined | Field description if declared in the schema |
registryAt | string | ISO 8601 timestamp of when the registry was built |
How much does it cost to search actor output schemas?
Actor Schema Registry uses pay-per-event pricing — you pay $0.20 per schema search. Platform compute costs are included. The actor charges once per successful run, regardless of how many actors it scans.
| Scenario | Runs | Cost per run | Total cost |
|---|---|---|---|
| Quick test | 1 | $0.20 | $0.20 |
| Daily schema search | 7 | $0.20 | $1.40 |
| Weekly documentation refresh | 4 | $0.20 | $0.80/month |
| Team of 5 developers | 20 | $0.20 | $4.00/month |
| Automated CI pipeline checks | 100 | $0.20 | $20.00/month |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.
There is no comparable self-service tool for schema discovery across Apify actor fleets. The alternative is manual — opening each actor's build page in the console — which takes 30-60 seconds per actor. For a 50-actor fleet, that is 25-50 minutes of manual work replaced by a $0.20 run.
Search actor output schemas using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-schema-registry").call(run_input={
"searchField": "email",
"category": "LEAD_GENERATION"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Scanned {item['totalActors']} actors, found schema in {item['actorsWithSchema']}")
if item.get("searchResults"):
for result in item["searchResults"]:
actors = ", ".join(r["actor"] for r in result["foundIn"])
print(f" Field '{result['field']}' ({result['type']}) found in: {actors}")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/actor-schema-registry").call({
searchField: "email",
category: "LEAD_GENERATION"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`Scanned ${item.totalActors} actors, ${item.actorsWithSchema} with schemas`);
if (item.searchResults) {
for (const result of item.searchResults) {
const actors = result.foundIn.map(r => r.actor).join(", ");
console.log(` '${result.field}' (${result.type}) → ${actors}`);
}
}
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-schema-registry/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"searchField": "email", "category": "LEAD_GENERATION"}'
# Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Actor Schema Registry works
Phase 1: Actor fleet enumeration
The actor calls GET /v2/acts?token=...&limit=500&my=true to retrieve all actors owned by the token's account, up to 500 at a time. If a category filter was supplied, the list is filtered in memory against each actor's categories array before any build fetches begin. This means the category filter happens client-side against the actor metadata, not as an API-level query parameter.
Phase 2: Build schema extraction
For each actor in the filtered list, the code reads actor.taggedBuilds?.latest?.buildId from the actor metadata. If no latest build exists (the actor has never been built), it is skipped. For actors with a build ID, the actor calls GET /v2/acts/{actorId}/builds/{buildId} and reads the data.actorDefinition.storages.dataset.fields object from the response. This object contains the actor's declared output schema. Each field entry is an object with optional type, title, and description properties following JSON Schema conventions. All fetch calls use AbortSignal.timeout(30000) to enforce a 30-second timeout per request. Any build fetch that fails or times out is silently skipped — the run continues with remaining actors.
Phase 3: Index construction
Two data structures are built in parallel. The registry array accumulates one entry per actor that had a parseable schema, containing the actor name, actor ID, and the full array of field objects. The fieldIndex map inverts this structure: keys are field names, values are arrays of { actor, actorId, type } objects. This inverted index is what powers the fast field name search.
Phase 4: Search or browse output
When searchField is provided, the code filters fieldIndex keys using a case-insensitive includes() check against the lowercased search term. Matching entries are mapped to search result objects and sorted descending by foundIn.length — fields present in more actors appear first. The final report object sets searchResults and leaves registry as undefined. In browse mode (no searchField), the report sets registry with the full per-actor listing and leaves searchResults as undefined. The report is pushed to the dataset as a single item via Actor.pushData().
Tips for best results
- Run in browse mode first. Start your first run with no inputs to get a complete inventory. This shows you both how many actors have schemas and which ones are missing definitions entirely.
- Use short search terms for discovery. Searching
urlreturns all URL-variant fields across your fleet. Use longer terms likecontactEmailwhen you know the exact field name you are looking for. - Combine category filter with search. If you have actors across many categories, adding
category: "ECOMMERCE"before searching forpriceeliminates noise from non-commerce actors and speeds up the run. - Schedule weekly for documentation. A $0.20 weekly scheduled run keeps a living record of your fleet's output fields. Connect the output to a Google Sheet via Apify integrations to maintain auto-updating documentation.
- Use actor IDs from results directly. The
actorIdfield in every result is the exact ID you need for Apify API calls — no additional lookup required. Copy it directly into your API requests or Pipeline Builder configurations. - Run before pipeline design. Before building a multi-actor pipeline, run the schema registry to confirm which actors produce the fields you need and what those fields are typed as. This prevents integration failures caused by field name assumptions.
- Track
actorsWithSchemavstotalActorsas a quality metric. A large gap between these two numbers means many actors in your fleet have undocumented output. Use this as a signal to addstorages.datasetdefinitions to your actor builds.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Schema Diff | Use Schema Registry to find all actors outputting a target field, then pass two actor IDs to Schema Diff to compare their complete dataset schemas side by side. |
| Pipeline Builder | Registry identifies which actors produce the output fields you need; Pipeline Builder chains those actors into an automated multi-step workflow. |
| Schema Validator | Registry surfaces actors with missing schemas; Schema Validator tests whether an actor's actual output matches its declared schema definition. |
| Actor Quality Audit | Run Schema Registry to find actors without dataset schemas, then feed those actor IDs into Quality Audit for a full compliance review. |
| B2B Lead Gen Suite | Verify that all lead generation actors in your fleet output consistent field names (e.g., email vs emailAddress) before connecting them to a shared CRM pipeline. |
| Website Contact Scraper | Use Schema Registry to confirm the exact field names this actor outputs before writing downstream processing code. |
| Waterfall Contact Enrichment | Search the registry for email variants to identify which enrichment actors produce which field names, ensuring consistent field mapping in your enrichment pipeline. |
Limitations
- Maximum 500 actors per scan. The Apify actors list API is called with
limit=500. Accounts with more than 500 actors will only have the first 500 scanned. - Only actors with a latest build are scanned. Actors that have been created but never built are skipped silently. Build the actor at least once to make its schema discoverable.
- Only dataset schemas are indexed. The actor reads
actorDefinition.storages.dataset.fieldsonly. Key-value store schemas, request queue schemas, and other storage types are not indexed. - Schema must be declared in the build. If an actor outputs data but has not declared a
storages.datasetblock in its actor definition, it will appear in thetotalActorscount but not inactorsWithSchema. Many actors on the Apify Store omit formal schema declarations. - Field types depend on schema quality. If an actor's schema does not include a
typeannotation for a field, the type will be reported asunknown. This is a property of the source schema, not a bug in this actor. - No pagination of build results. The actor fetches only the latest tagged build per actor. Historical builds and their schemas are not accessible.
- Category filtering is client-side. Category filtering happens after fetching the full actor list. It does not reduce the number of API calls to the actors list endpoint.
- Actor must be owned by the token's account. This actor uses
my=trueon the actors list call, so it only scans actors you own. It cannot scan another user's actors or actors from a shared organization account unless the token belongs to that account.
Integrations
- Zapier — trigger an actor schema search when a new actor is deployed, then post the field inventory to a Slack channel or Notion page
- Make — schedule weekly schema registry runs and pipe the output into a Google Sheet to maintain a living schema documentation table
- Google Sheets — export the full registry as a spreadsheet where each row is one actor-field combination, giving your team a searchable field reference
- Apify API — call the actor programmatically from CI/CD pipelines to verify schema presence before deploying new actor versions
- Webhooks — notify your team's Slack or Teams channel when a schema registry run completes or when the
actorsWithSchemacount drops below a threshold - LangChain / LlamaIndex — feed the schema registry output as structured context into an LLM workflow to let AI assistants answer questions about which actors in your fleet produce specific data types
Troubleshooting
-
actorsWithSchemais much lower thantotalActors— This is expected for many Apify actor portfolios. Most actors do not declare a formalstorages.datasetblock in their actor definition. To fix this, add astorages.dataset.fieldsdeclaration to your actor'sactor.jsonor.actor/actor.jsonand rebuild. After rebuilding, a fresh registry scan will include the newly declared fields. -
Run returns 0 actors — Verify that the
APIFY_TOKENenvironment variable is available. When running via the Apify platform this is set automatically. If running locally, ensure the token is configured. Also confirm the token has access to the actors you expect — the scan usesmy=true, which returns only actors owned by the token's account. -
Category filter returns fewer actors than expected — Category values are case-sensitive strings matching the Apify Store categories exactly:
LEAD_GENERATION,SEO_TOOLS,SOCIAL_MEDIA,ECOMMERCE, etc. Check the actor's category assignment in the Apify console if an actor you expect to appear is missing from filtered results. -
Run times out on a large fleet — Each build fetch carries a 30-second timeout. For accounts with many actors and slow build API responses, total run time grows proportionally. Applying a
categoryfilter reduces the number of build fetches and cuts run time. If the issue persists, contact support with run sharing enabled (see below). -
Search returns no results despite knowing a field exists — Confirm the actor has been built (a latest build must exist) and that its actor definition includes a
storages.dataset.fieldsblock. If the field is declared but the actor was built before the schema was added, rebuild the actor to update the stored build definition.
Responsible use
- This actor only accesses actors and build data owned by the API token's account.
- It does not access any public or third-party actor schemas without authorization.
- Schema data belongs to the actor developer — treat extracted field definitions as internal intellectual property.
- For guidance on Apify API usage limits and fair use, see Apify's documentation.
FAQ
How does Actor Schema Registry find actor output fields?
It calls the Apify Builds API for each actor's latest build and reads the actorDefinition.storages.dataset.fields object. This is the formal dataset schema declaration, separate from the actor's README or output examples. Only actors that have this block defined in their build will appear in the actorsWithSchema count.
How many actors can the schema registry scan in one run?
Up to 500 actors per run. The Apify actors list API is queried with limit=500&my=true. If your account has more than 500 actors, only the first 500 returned by the API will be scanned.
Does searching actor schemas require running the actors themselves? No. The registry reads static schema definitions from build metadata only. No actor runs are triggered, no scraping happens, and no external websites are accessed. The scan is purely API-driven against your Apify account data.
What is the difference between search mode and browse mode?
When you provide a searchField, the actor returns only the fields that match your search term, sorted by how many actors contain them. When searchField is omitted, the actor returns the complete registry — one entry per actor, listing all of that actor's declared fields. Browse mode is better for auditing and documentation; search mode is better when you know what field you are looking for.
How accurate is the field type information?
Type accuracy depends entirely on the quality of the source schema. Fields declared with a type property (e.g., "type": "string") are reported accurately. Fields with no type declaration are reported as unknown. The registry does not validate or infer types from actual actor output — it only reads what is declared in the schema definition.
How long does a schema registry run take? A 50-actor fleet typically completes in under 60 seconds. Each actor requires one build API call with a 30-second timeout. Total time scales roughly linearly with actor count. Applying a category filter reduces the number of actors processed and speeds up the run proportionally.
Can I use the schema registry to search another developer's actors?
No. The actor uses my=true on the actors list call, which returns only actors owned by the API token's account. To search another developer's actors, you would need a token that belongs to their account.
How is the schema registry different from browsing the Apify console manually? The console requires you to open each actor individually, navigate to the build page, and inspect the actor definition. For a 50-actor fleet, that is approximately 30-50 minutes of manual navigation. This actor completes the same task in under 60 seconds and returns structured, searchable JSON output.
Can I schedule the schema registry to run automatically? Yes. Use Apify's built-in scheduler to run this actor on any interval — daily, weekly, or monthly. The output can be connected to Google Sheets or webhooks to maintain automatically updated schema documentation.
What happens if an actor has no latest build?
Actors with no taggedBuilds.latest.buildId are skipped silently. They are counted in totalActors but not in actorsWithSchema. Build the actor at least once to make its schema available for indexing.
Does the schema registry support searching by field type? Not currently. Search is by field name only (case-insensitive substring match). You can filter results by type after downloading the output if you need type-based filtering.
Is it legal to scan actor schemas using the API? Yes. You are accessing your own account's data using your own API token. The Apify API is designed for programmatic account access. Scanning your own actors' build metadata is explicitly supported and within normal API usage.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.