LLM Output Optimizer
Optimize actor titles, descriptions, and README content for AI/LLM discovery. Analyses how ChatGPT, Claude, and Perplexity interpret your actor listing.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| llm-optimization | Charged per optimization analysis. | $0.25 |
Example: 100 events = $25.00 · 1,000 events = $250.00
Documentation
LLM Output Optimizer analyzes any Apify actor's output schema and tells you exactly which fields to keep, drop, or truncate before feeding data into an LLM pipeline. It reads from the actor's most recent successful run, scores every field by information density, and produces a token-reduction report in seconds. Typical savings range from 40% to 70% of your LLM token budget.
The actor fetches a configurable sample of output items from the target actor's latest dataset, runs per-field token estimation using a character-based model (~4 characters per token for English text), classifies each field as high-value, medium-value, or low-value, and generates a recommended optimized schema. No re-running the target actor is required — the analysis reads from existing output data.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| Original token count | Full output analysis | 4,200 tokens |
| Optimized token count | After applying recommendations | 1,680 tokens |
| Savings percentage | Calculated reduction | 60% |
| Field name | Per-field analysis | rawHtml |
| Field value classification | Pattern + length heuristics | low / medium / high |
| Recommended action | Per field | drop / keep / truncate |
| Null ratio | Missing value rate | 0.83 |
| Average field length | Characters per value | 4,500 |
| Token cost per field | Estimated from character count | 2,800 tokens |
| Optimized schema | Recommended field list | ["url", "emails", "phones", "name"] |
| Recommendations | Actionable suggestions with savings | Drop 3 low-value fields — saves 2,520 tokens (60%) |
| Analysis timestamp | ISO 8601 | 2026-03-20T14:32:00.000Z |
Why use LLM Output Optimizer?
Most Apify actors return far more data than an LLM actually needs. A web scraper might return rawHtml, sourceHtml, pageContent, internal timestamps, debug hashes, and a dozen other fields that consume thousands of tokens per record — at real cost. If you are passing actor output to GPT-4, Claude, or Gemini, you are likely paying for 3-5x more tokens than your prompts require.
Manually auditing an output schema requires pulling a dataset, inspecting field distributions, estimating token costs, and writing custom field filters. For a 30-field schema across 10 sample records, this takes 30-60 minutes per actor. This actor does it in under 10 seconds for $0.35.
- Scheduling — schedule weekly re-analysis to catch schema drift as actors are updated
- API access — trigger from Python, JavaScript, or any HTTP client as part of your pipeline build process
- Proxy rotation — not required; the actor calls the Apify REST API directly using your token
- Monitoring — get Slack or email alerts when the optimization run fails or a target actor produces empty output
- Integrations — connect results to Zapier, Make, or Google Sheets to track token savings across your full actor portfolio
Features
- Character-based token estimation — approximates token count at ~4 characters per token, matching GPT-4 tokenizer behavior for English text without requiring a tokenizer library
- 15+ low-value field pattern matching — automatically flags fields matching patterns including
_id,_at,timestamp,scraped,crawled,hash,checksum,internal,debug,raw,html,rawHtml,sourceHtml, andpageContent - 11+ high-value field pattern matching — protects fields matching
name,title,url,email,phone,price,rating,address,description,summary,category, andstatusfrom being dropped - Null ratio analysis — computes the proportion of null or missing values per field; fields with >80% null values are automatically flagged for removal
- Long-field truncation detection — fields averaging more than 1,000 characters are flagged as truncation candidates, with optimized token cost estimated at 20% of original (equivalent to a 200-character truncation)
- Token cost sorted output — field analysis is sorted by token cost descending so the highest-impact optimizations appear first
- Optimized schema generation — produces a final
optimizedSchemafield list containing only fields recommended askeeportruncate - Quantified recommendations — each recommendation includes exact token counts and percentage savings, not just qualitative advice
- Configurable sample size — analyze 5 to 100 records depending on how representative you need the sample to be
- Read-only analysis — never modifies the target actor, its settings, or its output; reads only from existing dataset items
Use cases for LLM output optimization
AI pipeline cost reduction
Teams building RAG systems, AI agents, or document Q&A pipelines on top of Apify scrapers need to minimize token throughput. Before connecting a scraper to GPT-4o or Claude 3.5, run the optimizer to identify which fields are safe to drop. A single optimization pass on a 40-field scraper output can cut per-record token cost from 3,000 to 800 tokens — reducing downstream LLM costs by 70% across millions of records.
Actor schema auditing before production
Developers preparing an actor for production use can use this tool to audit the output schema for unnecessary bloat. A quick analysis reveals whether fields like pageContent, rawHtml, or internal debug fields made it into the output — fields that add token cost with no downstream value in most AI workflows.
LLM-ready dataset preparation
Data teams building fine-tuning datasets or evaluation benchmarks from scraped data need clean, dense schemas. The optimizer identifies sparse fields (high null ratios) and verbose fields (long raw content) that should be excluded from training examples to keep dataset quality high and token costs low.
Multi-actor pipeline optimization
When chaining multiple actors — for example, a Google Maps scraper feeding into a contact enrichment actor feeding into an LLM summarizer — each stage multiplies token costs. Running the optimizer on each actor in the chain surfaces the cumulative savings opportunity before the pipeline goes live.
Actor portfolio token budgeting
Developers managing a portfolio of 10+ actors and running LLM workflows on their combined output can use the optimizer to benchmark token efficiency across the portfolio. The structured output makes it straightforward to compare token density scores and prioritize which actors need schema cleanup.
How to optimize actor output for LLM pipelines
-
Find your target actor ID — Go to the Apify Console, open the actor you want to analyze, and copy the actor ID from the URL or Settings tab. It looks like
ryanclinton/website-contact-scraperor a numeric ID likeBHzefUZlZRKWxkTch. The actor must have at least one successful run with output data. -
Configure the sample size — The default of 10 items is suitable for most actors. Increase to 25-50 if the actor has a large schema with many sparse fields, to get a more representative null ratio estimate.
-
Run the actor — Click "Start" and wait. Most analyses complete in 5-15 seconds. The actor fetches the schema from the target actor's most recent successful run — it does not re-execute the target actor.
-
Review the report — Download the JSON result from the Dataset tab. The
fieldAnalysisarray is sorted by token cost descending — start at the top. Apply theoptimizedSchemaas a field allowlist in your LLM pipeline or use therecommendationsarray to guide manual schema cleanup.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
targetActorId | string | Yes | — | Apify actor ID to analyze. Accepts username/actor-name format (e.g., ryanclinton/website-contact-scraper) or a numeric actor ID. The actor must have at least one SUCCEEDED run. |
sampleSize | integer | No | 10 | Number of output items to fetch from the target actor's latest dataset for analysis. Higher values give more accurate null ratios. Recommended range: 5–50. |
Input examples
Analyze a scraper before connecting to an LLM:
{
"targetActorId": "ryanclinton/website-contact-scraper"
}
Larger sample for a sparse schema:
{
"targetActorId": "ryanclinton/google-maps-email-extractor",
"sampleSize": 25
}
Minimal run using numeric actor ID:
{
"targetActorId": "BHzefUZlZRKWxkTch",
"sampleSize": 5
}
Input tips
- Run the target actor first — the optimizer reads from the most recent SUCCEEDED run. If no successful run exists, the actor will return an error with a clear message.
- Use the default sample size to start — 10 items captures the schema shape and null patterns for most actors. Only increase if you have reason to believe field sparsity varies significantly across records.
- Prefer
username/actor-nameformat — it is more readable and easier to verify than a numeric ID, and both formats are supported. - Re-run after actor updates — actor schemas change when actors are updated. A monthly re-analysis detects new low-value fields introduced by upstream changes.
Output example
{
"actorName": "ryanclinton/website-contact-scraper",
"actorId": "ryanclinton/website-contact-scraper",
"sampleSize": 10,
"originalTokens": 4200,
"optimizedTokens": 1512,
"savingsPercent": 64,
"fieldAnalysis": [
{
"field": "rawHtml",
"tokens": 2800,
"value": "low",
"action": "drop",
"nullRatio": 0,
"avgLength": 4500
},
{
"field": "pageContent",
"tokens": 520,
"value": "low",
"action": "drop",
"nullRatio": 0,
"avgLength": 840
},
{
"field": "description",
"tokens": 180,
"value": "high",
"action": "truncate",
"nullRatio": 0.1,
"avgLength": 1240
},
{
"field": "emails",
"tokens": 120,
"value": "high",
"action": "keep",
"nullRatio": 0.2,
"avgLength": 180
},
{
"field": "url",
"tokens": 45,
"value": "high",
"action": "keep",
"nullRatio": 0,
"avgLength": 65
},
{
"field": "phones",
"tokens": 38,
"value": "high",
"action": "keep",
"nullRatio": 0.3,
"avgLength": 55
},
{
"field": "domain",
"tokens": 22,
"value": "medium",
"action": "keep",
"nullRatio": 0,
"avgLength": 30
},
{
"field": "scrapedAt",
"tokens": 18,
"value": "low",
"action": "drop",
"nullRatio": 0,
"avgLength": 24
}
],
"optimizedSchema": ["description", "emails", "url", "phones", "domain"],
"recommendations": [
"Drop 3 low-value fields — saves 3,338 tokens (79%)",
"Truncate 1 long fields to 200 chars — reduces token count significantly"
],
"analyzedAt": "2026-03-20T14:32:00.000Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
actorName | string | Display name of the analyzed actor in username/actorName format |
actorId | string | The actor ID provided as input |
sampleSize | integer | Number of items analyzed from the dataset |
originalTokens | integer | Total estimated token count across all fields in the sample |
optimizedTokens | integer | Estimated token count after applying recommended drops and truncations |
savingsPercent | integer | Percentage reduction from original to optimized token count |
fieldAnalysis | array | Per-field analysis objects, sorted by token cost descending |
fieldAnalysis[].field | string | Field name from the actor output schema |
fieldAnalysis[].tokens | integer | Estimated token cost for this field across all sample items |
fieldAnalysis[].value | string | Classification: high, medium, or low |
fieldAnalysis[].action | string | Recommendation: keep, drop, or truncate |
fieldAnalysis[].nullRatio | float | Proportion of items where this field is null or missing (0.0–1.0) |
fieldAnalysis[].avgLength | integer | Average character length of values for this field |
optimizedSchema | array | List of field names recommended to keep (keep + truncate fields only) |
recommendations | array | Human-readable action items with quantified token savings |
analyzedAt | string | ISO 8601 timestamp of when the analysis was completed |
error | string | Present only on failure; describes why the analysis could not complete |
How much does it cost to optimize actor output for LLMs?
LLM Output Optimizer uses pay-per-event pricing — you pay $0.35 per analysis. Platform compute costs are included.
| Scenario | Analyses | Cost per analysis | Total cost |
|---|---|---|---|
| Quick test | 1 | $0.35 | $0.35 |
| Audit 5 actors | 5 | $0.35 | $1.75 |
| Audit 20 actors | 20 | $0.35 | $7.00 |
| Full portfolio review | 50 | $0.35 | $17.50 |
| Continuous monitoring (monthly) | 100 | $0.35 | $35.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.
A single analysis costs less than a minute of GPT-4o inference time. For a 30-field actor producing 60% savings, the $0.35 cost pays back on the very first LLM call using the optimized schema. Apify's free tier includes $5 of monthly credits — enough for 14 analyses at no charge.
Optimize actor output using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-llm-optimizer").call(run_input={
"targetActorId": "ryanclinton/website-contact-scraper",
"sampleSize": 10
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Actor: {item['actorName']}")
print(f"Token savings: {item['savingsPercent']}% ({item['originalTokens']} → {item['optimizedTokens']} tokens)")
print(f"Optimized schema: {item['optimizedSchema']}")
for rec in item.get("recommendations", []):
print(f" - {rec}")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/actor-llm-optimizer").call({
targetActorId: "ryanclinton/website-contact-scraper",
sampleSize: 10
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`Actor: ${item.actorName}`);
console.log(`Token savings: ${item.savingsPercent}% (${item.originalTokens} → ${item.optimizedTokens} tokens)`);
console.log(`Optimized schema: ${JSON.stringify(item.optimizedSchema)}`);
item.recommendations?.forEach(rec => console.log(` - ${rec}`));
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-llm-optimizer/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"targetActorId": "ryanclinton/website-contact-scraper", "sampleSize": 10}'
# Fetch results (replace DATASET_ID with the defaultDatasetId from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How LLM Output Optimizer works
Phase 1: Target actor and dataset resolution
The actor accepts a targetActorId in either username/actor-name or numeric ID format. It normalizes the ID by converting the / separator to ~ for Apify REST API URL compatibility. It then calls GET /v2/acts/{actorId} to verify the actor exists and retrieve its display name. Next, it queries GET /v2/acts/{actorId}/runs with limit=1, desc=true, and status=SUCCEEDED to retrieve the most recent successful run. The dataset ID is extracted from runs.items[0].defaultDatasetId.
Phase 2: Per-field classification and token estimation
The actor fetches up to sampleSize items from the dataset using GET /v2/datasets/{datasetId}/items. It enumerates all unique field names across all sample items (not just the first record, to handle sparse schemas). For each field, it computes three metrics: (1) estimated token cost by joining all values as JSON strings and dividing total character count by 4; (2) null ratio as the proportion of items where the field is absent or null; (3) average character length per non-null value.
Each field is then classified using a two-step pattern matching approach. If the lowercase field name contains any of 15 low-value patterns (_id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent), the field is classified as low. If it matches any of 11 high-value patterns (name, title, url, email, phone, price, rating, address, description, summary, category, status), it is classified as high. Otherwise, if the average serialized value length exceeds 500 characters, the field is classified as low; otherwise medium.
Phase 3: Action assignment and savings calculation
Each field receives one of three recommended actions. Fields classified as low are assigned drop. Fields with a null ratio above 0.80 are assigned drop regardless of classification. Fields with average character length above 1,000 characters receive truncate (with optimized token cost estimated at 20% of original, representing a ~200-character truncation). All other fields receive keep.
The optimizedTokens count sums token costs for all keep fields plus 20% of token costs for truncate fields. The savings percentage is (1 - optimizedTokens / originalTokens) * 100. The fieldAnalysis array is sorted by token cost descending so the highest-impact fields appear first. Three categories of recommendations are generated where applicable: a drop recommendation with aggregate token savings and percentage, a truncation recommendation, and a high-null-rate warning for fields with >50% null values that were not already dropped.
Phase 4: Output and PPE charge
The full report is pushed to the Apify dataset in a single record. If the actor is running under pay-per-event pricing, a single llm-optimization event is charged at $0.35. The charge call checks eventChargeLimitReached and logs a warning if the spending limit was hit before the charge completed.
Tips for best results
-
Run the target actor at least once before optimizing. The optimizer reads from the most recent SUCCEEDED run. If you just deployed an actor and have not run it yet, run it with a representative input first so the output schema is populated.
-
Increase sampleSize for sparse schemas. If an actor returns data where many fields are only populated for specific record types (e.g., a scraper that extracts different data from different page types), a sample of 5-10 may not capture the full null distribution. Use sampleSize 25-50 for more accurate null ratios.
-
Treat
truncaterecommendations as context-dependent. The optimizer flags fields averaging more than 1,000 characters for truncation. Whether 200 characters is sufficient depends on your LLM task — for classification tasks it often is, but for summarization or extraction tasks you may want to keep more. Use theavgLengthvalue to calibrate. -
Use
optimizedSchemaas a field allowlist, not a deletion list. In your downstream code, use theoptimizedSchemaarray to select only the fields you need rather than trying to delete individual fields. This is more maintainable as the upstream actor schema evolves. -
Re-analyze after upstream actor updates. Apify actors are updated regularly. A field like
sourceHtmlmight be added in a new version and silently inflate your token costs. Schedule a monthly re-analysis to detect schema drift. -
Combine with B2B Lead Gen Suite for AI enrichment pipelines. If you are feeding enriched lead data into an LLM for scoring or qualification, optimizing the enriched output schema first can cut per-lead LLM costs significantly before the pipeline scales.
-
Check the
errorfield in the output before processing. If the target actor ID is wrong, has no successful runs, or has an empty dataset, the actor returns a structured error record rather than failing silently. Always checkitem.errorin your consuming code before reading the analysis fields.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Contact Scraper | Analyze contact scraper output to drop rawHtml and pageContent before feeding contacts to an LLM enrichment step — typical 60%+ savings. |
| Google Maps Email Extractor | Optimize the business profile output schema before passing records to a lead scoring LLM to reduce cost per scored lead. |
| Company Deep Research | Deep research reports contain verbose text fields — use the optimizer to identify which fields to include in LLM prompts versus store separately. |
| Waterfall Contact Enrichment | Enriched contact records contain many source-specific metadata fields; optimize before passing to a CRM-sync or AI qualification step. |
| B2B Lead Gen Suite | Audit the full pipeline output schema end-to-end before connecting to an AI lead scorer — maximizes cost efficiency at scale. |
| Website Content to Markdown | Markdown conversion actors can include metadata fields; optimize to identify which metadata adds value versus padding tokens. |
| Trustpilot Review Analyzer | Review data often includes raw review text and structured sentiment fields — use the optimizer to decide which representation is more token-efficient for downstream LLM processing. |
Limitations
- Reads only from the most recent SUCCEEDED run. If the target actor's latest successful run has a schema that differs from its current output format (e.g., after an actor update), the analysis reflects the old schema. Re-run the target actor first to get a fresh dataset.
- Does not analyze nested objects. The optimizer analyzes top-level fields only. If an actor returns deeply nested objects (e.g.,
metadata.internal.debugHash), only the top-level field name is analyzed — nested low-value fields inside a kept object are not flagged. - Token estimation is approximate. The ~4 characters per token heuristic works well for English prose but overestimates tokens for numeric data and underestimates for non-Latin scripts. Treat savings percentages as directional, not precise.
- Pattern matching is heuristic, not semantic. A field named
statusis classified as high-value even if its values are internal status codes with no LLM utility. Review thefieldAnalysisoutput before applying recommendations blindly. - Does not suggest field transformations. The optimizer recommends drop or truncate but does not suggest how to transform values — for example, converting a full address string into structured components, or extracting the domain from a URL. These optimizations require task-specific logic.
- Requires the target actor to be accessible with your token. The actor uses the
APIFY_TOKENenvironment variable injected by the Apify platform. The target actor must be owned by the same account or be a public actor. Private actors from other accounts cannot be analyzed. - sampleSize above 100 is untested. The Apify dataset API supports large item counts, but very large samples increase run time and memory usage. Keep sampleSize under 100 for reliable performance within the 512 MB memory limit.
- No diff between schema versions. The optimizer produces a snapshot analysis, not a changelog. It cannot tell you whether a schema has changed since the last analysis — use a dedicated schema monitoring tool for that use case.
Integrations
- Zapier — trigger an optimization analysis automatically after a target actor run completes, then send the savings report to Slack or email
- Make — build workflows that analyze actor output schemas on a schedule and log results to a Google Sheet for portfolio tracking
- Google Sheets — push optimization reports to a spreadsheet to track token savings trends across your actor portfolio over time
- Apify API — integrate into CI/CD pipelines to validate schema token efficiency before deploying new actor versions to production
- Webhooks — chain the optimizer as a post-run step after any actor, automatically alerting when a schema change causes token costs to increase
- LangChain / LlamaIndex — use the
optimizedSchemaoutput to dynamically filter actor data before passing to LangChain document loaders or LlamaIndex data connectors
Troubleshooting
-
"No recent runs found" error — The target actor has no SUCCEEDED runs in its run history. Run the target actor at least once with a valid input that produces output, then re-run the optimizer. Note that FAILED or TIMED-OUT runs are not used.
-
"Actor not found" error — The
targetActorIdvalue is incorrect. Verify the actor ID by navigating to the actor in the Apify Console — the URL contains the correct slug inusername/actor-nameformat. If copying from an actor's API settings, use the slug format rather than the internal UUID where possible. -
Output shows 0% savings — All fields in the sample matched high-value patterns or had average lengths under 500 characters. This means the actor's output is already dense and well-structured. Review the
fieldAnalysisarray to confirm — if all fields are correctly classified ashighormedium, no optimization is needed. -
Very high savings estimate (>90%) — This typically means the actor includes a
rawHtmlorpageContentfield that dominates token cost. Verify the recommendation makes sense for your use case. For some LLM tasks (HTML extraction, element classification), raw HTML may be necessary despite its token cost. -
Analysis returns fewer fields than expected — The optimizer enumerates fields present in the sample items. If some fields only appear in a subset of records and your sampleSize is too small, those fields may not appear in the analysis. Increase sampleSize to 25-50 to capture sparse fields.
Responsible use
- This actor only accesses Apify dataset output that belongs to actors authorized under your Apify API token.
- Do not use this actor to analyze output from actors you do not have explicit permission to access.
- Token savings recommendations are heuristic; review all
droprecommendations before applying them to production pipelines to ensure no business-critical data is discarded. - For guidance on responsible AI data pipeline construction, see Apify's platform documentation.
FAQ
How accurate is the LLM token savings estimate?
The optimizer uses the ~4 characters per token approximation, which closely matches GPT-4's cl100k_base tokenizer for English prose. For structured data like JSON numbers, URLs, and short strings, actual token counts may differ by 10-20%. Treat the savings percentage as directional — it will consistently identify your highest-cost fields even if the exact numbers vary slightly.
Does LLM Output Optimizer re-run the target actor? No. The optimizer reads from the existing output of the target actor's most recent successful run. It never triggers a new run, never modifies the target actor, and never incurs compute charges on the target actor. The only cost is the $0.35 analysis fee.
How many fields can LLM Output Optimizer analyze in one run? The optimizer handles any number of top-level fields. It enumerates all unique field names across all sampled items, so even sparse schemas with 50+ fields are fully analyzed. The default sampleSize of 10 is sufficient to identify field classifications for most actors.
Can I analyze a private actor that belongs to another user?
No. The actor uses the APIFY_TOKEN environment variable provided by the Apify platform. You can only analyze actors that are accessible under your own account — your own actors plus any public actors on the Apify Store.
How is LLM Output Optimizer different from reading the actor's output schema manually? Manual inspection tells you what fields exist but not their token cost, null distribution, or relative information density. The optimizer quantifies token cost per field, ranks fields by cost, computes null ratios across a sample, and generates specific action recommendations with savings percentages — in 10 seconds versus 30-60 minutes of manual analysis.
What types of fields are automatically flagged for removal?
Fields whose names contain any of these patterns are classified as low-value and recommended for dropping: _id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent. Additionally, any field (regardless of name) with more than 80% null values is flagged for removal.
Can I schedule LLM Output Optimizer to run periodically? Yes. Use Apify's built-in scheduler to run the optimizer weekly or monthly against your key actors. This detects schema drift — cases where an actor update adds new verbose fields that inflate your LLM costs. The structured output makes it straightforward to track savings trends over time.
What happens if the target actor has an empty dataset?
The optimizer returns a structured error record: {"error": "Latest run produced an empty dataset. Nothing to optimize."}. This can happen if the actor ran successfully but produced no output items — for example, if the search returned zero results. Re-run the target actor with an input that produces data before analyzing.
How is this different from just filtering fields in my code? You can absolutely filter fields manually — but you first need to know which fields are worth filtering and how much each one costs. The optimizer answers those questions. Think of it as a profiler for your LLM data pipeline: it tells you where the token budget is going so you can make informed decisions rather than guessing.
Is it legal to analyze actor output data with this tool? Yes. The optimizer analyzes output data from actors running under your own Apify account. You are reading your own data. No external websites are accessed during the analysis. For guidance on the legality of the data collected by the target actors themselves, see Apify's guide on web scraping legality.
How long does a typical LLM optimization analysis take? Most analyses complete in 5-15 seconds. The actor makes three sequential API calls (actor lookup, runs lookup, dataset fetch) plus local computation. The dataset fetch time is the main variable — larger sampleSize values or very wide schemas (50+ fields) take slightly longer.
Can I use the optimizedSchema output directly in my LangChain or LlamaIndex pipeline?
Yes. The optimizedSchema field is a plain JSON array of field name strings. You can use it directly as a field allowlist when constructing Document objects in LangChain or as a metadata filter in LlamaIndex. See the Apify LangChain integration docs for connection examples.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.
Ready to try LLM Output Optimizer?
Start for free on Apify. No credit card required.
Open on Apify Store