DEVELOPER TOOLSAI

LLM Output Optimizer

Optimize actor titles, descriptions, and README content for AI/LLM discovery. Analyses how ChatGPT, Claude, and Perplexity interpret your actor listing.

Try on Apify Store
$0.25per event
1
Users (30d)
3
Runs (30d)
90
Actively maintained
Maintenance Pulse
$0.25
Per event

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

llm-optimizations
Estimated cost:$25.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
llm-optimizationCharged per optimization analysis.$0.25

Example: 100 events = $25.00 · 1,000 events = $250.00

Documentation

LLM Output Optimizer analyzes any Apify actor's output schema and tells you exactly which fields to keep, drop, or truncate before feeding data into an LLM pipeline. It reads from the actor's most recent successful run, scores every field by information density, and produces a token-reduction report in seconds. Typical savings range from 40% to 70% of your LLM token budget.

The actor fetches a configurable sample of output items from the target actor's latest dataset, runs per-field token estimation using a character-based model (~4 characters per token for English text), classifies each field as high-value, medium-value, or low-value, and generates a recommended optimized schema. No re-running the target actor is required — the analysis reads from existing output data.

What data can you extract?

Data PointSourceExample
Original token countFull output analysis4,200 tokens
Optimized token countAfter applying recommendations1,680 tokens
Savings percentageCalculated reduction60%
Field namePer-field analysisrawHtml
Field value classificationPattern + length heuristicslow / medium / high
Recommended actionPer fielddrop / keep / truncate
Null ratioMissing value rate0.83
Average field lengthCharacters per value4,500
Token cost per fieldEstimated from character count2,800 tokens
Optimized schemaRecommended field list["url", "emails", "phones", "name"]
RecommendationsActionable suggestions with savingsDrop 3 low-value fields — saves 2,520 tokens (60%)
Analysis timestampISO 86012026-03-20T14:32:00.000Z

Why use LLM Output Optimizer?

Most Apify actors return far more data than an LLM actually needs. A web scraper might return rawHtml, sourceHtml, pageContent, internal timestamps, debug hashes, and a dozen other fields that consume thousands of tokens per record — at real cost. If you are passing actor output to GPT-4, Claude, or Gemini, you are likely paying for 3-5x more tokens than your prompts require.

Manually auditing an output schema requires pulling a dataset, inspecting field distributions, estimating token costs, and writing custom field filters. For a 30-field schema across 10 sample records, this takes 30-60 minutes per actor. This actor does it in under 10 seconds for $0.35.

  • Scheduling — schedule weekly re-analysis to catch schema drift as actors are updated
  • API access — trigger from Python, JavaScript, or any HTTP client as part of your pipeline build process
  • Proxy rotation — not required; the actor calls the Apify REST API directly using your token
  • Monitoring — get Slack or email alerts when the optimization run fails or a target actor produces empty output
  • Integrations — connect results to Zapier, Make, or Google Sheets to track token savings across your full actor portfolio

Features

  • Character-based token estimation — approximates token count at ~4 characters per token, matching GPT-4 tokenizer behavior for English text without requiring a tokenizer library
  • 15+ low-value field pattern matching — automatically flags fields matching patterns including _id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, and pageContent
  • 11+ high-value field pattern matching — protects fields matching name, title, url, email, phone, price, rating, address, description, summary, category, and status from being dropped
  • Null ratio analysis — computes the proportion of null or missing values per field; fields with >80% null values are automatically flagged for removal
  • Long-field truncation detection — fields averaging more than 1,000 characters are flagged as truncation candidates, with optimized token cost estimated at 20% of original (equivalent to a 200-character truncation)
  • Token cost sorted output — field analysis is sorted by token cost descending so the highest-impact optimizations appear first
  • Optimized schema generation — produces a final optimizedSchema field list containing only fields recommended as keep or truncate
  • Quantified recommendations — each recommendation includes exact token counts and percentage savings, not just qualitative advice
  • Configurable sample size — analyze 5 to 100 records depending on how representative you need the sample to be
  • Read-only analysis — never modifies the target actor, its settings, or its output; reads only from existing dataset items

Use cases for LLM output optimization

AI pipeline cost reduction

Teams building RAG systems, AI agents, or document Q&A pipelines on top of Apify scrapers need to minimize token throughput. Before connecting a scraper to GPT-4o or Claude 3.5, run the optimizer to identify which fields are safe to drop. A single optimization pass on a 40-field scraper output can cut per-record token cost from 3,000 to 800 tokens — reducing downstream LLM costs by 70% across millions of records.

Actor schema auditing before production

Developers preparing an actor for production use can use this tool to audit the output schema for unnecessary bloat. A quick analysis reveals whether fields like pageContent, rawHtml, or internal debug fields made it into the output — fields that add token cost with no downstream value in most AI workflows.

LLM-ready dataset preparation

Data teams building fine-tuning datasets or evaluation benchmarks from scraped data need clean, dense schemas. The optimizer identifies sparse fields (high null ratios) and verbose fields (long raw content) that should be excluded from training examples to keep dataset quality high and token costs low.

Multi-actor pipeline optimization

When chaining multiple actors — for example, a Google Maps scraper feeding into a contact enrichment actor feeding into an LLM summarizer — each stage multiplies token costs. Running the optimizer on each actor in the chain surfaces the cumulative savings opportunity before the pipeline goes live.

Actor portfolio token budgeting

Developers managing a portfolio of 10+ actors and running LLM workflows on their combined output can use the optimizer to benchmark token efficiency across the portfolio. The structured output makes it straightforward to compare token density scores and prioritize which actors need schema cleanup.

How to optimize actor output for LLM pipelines

  1. Find your target actor ID — Go to the Apify Console, open the actor you want to analyze, and copy the actor ID from the URL or Settings tab. It looks like ryanclinton/website-contact-scraper or a numeric ID like BHzefUZlZRKWxkTch. The actor must have at least one successful run with output data.

  2. Configure the sample size — The default of 10 items is suitable for most actors. Increase to 25-50 if the actor has a large schema with many sparse fields, to get a more representative null ratio estimate.

  3. Run the actor — Click "Start" and wait. Most analyses complete in 5-15 seconds. The actor fetches the schema from the target actor's most recent successful run — it does not re-execute the target actor.

  4. Review the report — Download the JSON result from the Dataset tab. The fieldAnalysis array is sorted by token cost descending — start at the top. Apply the optimizedSchema as a field allowlist in your LLM pipeline or use the recommendations array to guide manual schema cleanup.

Input parameters

ParameterTypeRequiredDefaultDescription
targetActorIdstringYesApify actor ID to analyze. Accepts username/actor-name format (e.g., ryanclinton/website-contact-scraper) or a numeric actor ID. The actor must have at least one SUCCEEDED run.
sampleSizeintegerNo10Number of output items to fetch from the target actor's latest dataset for analysis. Higher values give more accurate null ratios. Recommended range: 5–50.

Input examples

Analyze a scraper before connecting to an LLM:

{
  "targetActorId": "ryanclinton/website-contact-scraper"
}

Larger sample for a sparse schema:

{
  "targetActorId": "ryanclinton/google-maps-email-extractor",
  "sampleSize": 25
}

Minimal run using numeric actor ID:

{
  "targetActorId": "BHzefUZlZRKWxkTch",
  "sampleSize": 5
}

Input tips

  • Run the target actor first — the optimizer reads from the most recent SUCCEEDED run. If no successful run exists, the actor will return an error with a clear message.
  • Use the default sample size to start — 10 items captures the schema shape and null patterns for most actors. Only increase if you have reason to believe field sparsity varies significantly across records.
  • Prefer username/actor-name format — it is more readable and easier to verify than a numeric ID, and both formats are supported.
  • Re-run after actor updates — actor schemas change when actors are updated. A monthly re-analysis detects new low-value fields introduced by upstream changes.

Output example

{
  "actorName": "ryanclinton/website-contact-scraper",
  "actorId": "ryanclinton/website-contact-scraper",
  "sampleSize": 10,
  "originalTokens": 4200,
  "optimizedTokens": 1512,
  "savingsPercent": 64,
  "fieldAnalysis": [
    {
      "field": "rawHtml",
      "tokens": 2800,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 4500
    },
    {
      "field": "pageContent",
      "tokens": 520,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 840
    },
    {
      "field": "description",
      "tokens": 180,
      "value": "high",
      "action": "truncate",
      "nullRatio": 0.1,
      "avgLength": 1240
    },
    {
      "field": "emails",
      "tokens": 120,
      "value": "high",
      "action": "keep",
      "nullRatio": 0.2,
      "avgLength": 180
    },
    {
      "field": "url",
      "tokens": 45,
      "value": "high",
      "action": "keep",
      "nullRatio": 0,
      "avgLength": 65
    },
    {
      "field": "phones",
      "tokens": 38,
      "value": "high",
      "action": "keep",
      "nullRatio": 0.3,
      "avgLength": 55
    },
    {
      "field": "domain",
      "tokens": 22,
      "value": "medium",
      "action": "keep",
      "nullRatio": 0,
      "avgLength": 30
    },
    {
      "field": "scrapedAt",
      "tokens": 18,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 24
    }
  ],
  "optimizedSchema": ["description", "emails", "url", "phones", "domain"],
  "recommendations": [
    "Drop 3 low-value fields — saves 3,338 tokens (79%)",
    "Truncate 1 long fields to 200 chars — reduces token count significantly"
  ],
  "analyzedAt": "2026-03-20T14:32:00.000Z"
}

Output fields

FieldTypeDescription
actorNamestringDisplay name of the analyzed actor in username/actorName format
actorIdstringThe actor ID provided as input
sampleSizeintegerNumber of items analyzed from the dataset
originalTokensintegerTotal estimated token count across all fields in the sample
optimizedTokensintegerEstimated token count after applying recommended drops and truncations
savingsPercentintegerPercentage reduction from original to optimized token count
fieldAnalysisarrayPer-field analysis objects, sorted by token cost descending
fieldAnalysis[].fieldstringField name from the actor output schema
fieldAnalysis[].tokensintegerEstimated token cost for this field across all sample items
fieldAnalysis[].valuestringClassification: high, medium, or low
fieldAnalysis[].actionstringRecommendation: keep, drop, or truncate
fieldAnalysis[].nullRatiofloatProportion of items where this field is null or missing (0.0–1.0)
fieldAnalysis[].avgLengthintegerAverage character length of values for this field
optimizedSchemaarrayList of field names recommended to keep (keep + truncate fields only)
recommendationsarrayHuman-readable action items with quantified token savings
analyzedAtstringISO 8601 timestamp of when the analysis was completed
errorstringPresent only on failure; describes why the analysis could not complete

How much does it cost to optimize actor output for LLMs?

LLM Output Optimizer uses pay-per-event pricing — you pay $0.35 per analysis. Platform compute costs are included.

ScenarioAnalysesCost per analysisTotal cost
Quick test1$0.35$0.35
Audit 5 actors5$0.35$1.75
Audit 20 actors20$0.35$7.00
Full portfolio review50$0.35$17.50
Continuous monitoring (monthly)100$0.35$35.00

You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.

A single analysis costs less than a minute of GPT-4o inference time. For a 30-field actor producing 60% savings, the $0.35 cost pays back on the very first LLM call using the optimized schema. Apify's free tier includes $5 of monthly credits — enough for 14 analyses at no charge.

Optimize actor output using the API

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/actor-llm-optimizer").call(run_input={
    "targetActorId": "ryanclinton/website-contact-scraper",
    "sampleSize": 10
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"Actor: {item['actorName']}")
    print(f"Token savings: {item['savingsPercent']}% ({item['originalTokens']} → {item['optimizedTokens']} tokens)")
    print(f"Optimized schema: {item['optimizedSchema']}")
    for rec in item.get("recommendations", []):
        print(f"  - {rec}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/actor-llm-optimizer").call({
    targetActorId: "ryanclinton/website-contact-scraper",
    sampleSize: 10
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`Actor: ${item.actorName}`);
    console.log(`Token savings: ${item.savingsPercent}% (${item.originalTokens} → ${item.optimizedTokens} tokens)`);
    console.log(`Optimized schema: ${JSON.stringify(item.optimizedSchema)}`);
    item.recommendations?.forEach(rec => console.log(`  - ${rec}`));
}

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-llm-optimizer/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"targetActorId": "ryanclinton/website-contact-scraper", "sampleSize": 10}'

# Fetch results (replace DATASET_ID with the defaultDatasetId from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How LLM Output Optimizer works

Phase 1: Target actor and dataset resolution

The actor accepts a targetActorId in either username/actor-name or numeric ID format. It normalizes the ID by converting the / separator to ~ for Apify REST API URL compatibility. It then calls GET /v2/acts/{actorId} to verify the actor exists and retrieve its display name. Next, it queries GET /v2/acts/{actorId}/runs with limit=1, desc=true, and status=SUCCEEDED to retrieve the most recent successful run. The dataset ID is extracted from runs.items[0].defaultDatasetId.

Phase 2: Per-field classification and token estimation

The actor fetches up to sampleSize items from the dataset using GET /v2/datasets/{datasetId}/items. It enumerates all unique field names across all sample items (not just the first record, to handle sparse schemas). For each field, it computes three metrics: (1) estimated token cost by joining all values as JSON strings and dividing total character count by 4; (2) null ratio as the proportion of items where the field is absent or null; (3) average character length per non-null value.

Each field is then classified using a two-step pattern matching approach. If the lowercase field name contains any of 15 low-value patterns (_id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent), the field is classified as low. If it matches any of 11 high-value patterns (name, title, url, email, phone, price, rating, address, description, summary, category, status), it is classified as high. Otherwise, if the average serialized value length exceeds 500 characters, the field is classified as low; otherwise medium.

Phase 3: Action assignment and savings calculation

Each field receives one of three recommended actions. Fields classified as low are assigned drop. Fields with a null ratio above 0.80 are assigned drop regardless of classification. Fields with average character length above 1,000 characters receive truncate (with optimized token cost estimated at 20% of original, representing a ~200-character truncation). All other fields receive keep.

The optimizedTokens count sums token costs for all keep fields plus 20% of token costs for truncate fields. The savings percentage is (1 - optimizedTokens / originalTokens) * 100. The fieldAnalysis array is sorted by token cost descending so the highest-impact fields appear first. Three categories of recommendations are generated where applicable: a drop recommendation with aggregate token savings and percentage, a truncation recommendation, and a high-null-rate warning for fields with >50% null values that were not already dropped.

Phase 4: Output and PPE charge

The full report is pushed to the Apify dataset in a single record. If the actor is running under pay-per-event pricing, a single llm-optimization event is charged at $0.35. The charge call checks eventChargeLimitReached and logs a warning if the spending limit was hit before the charge completed.

Tips for best results

  1. Run the target actor at least once before optimizing. The optimizer reads from the most recent SUCCEEDED run. If you just deployed an actor and have not run it yet, run it with a representative input first so the output schema is populated.

  2. Increase sampleSize for sparse schemas. If an actor returns data where many fields are only populated for specific record types (e.g., a scraper that extracts different data from different page types), a sample of 5-10 may not capture the full null distribution. Use sampleSize 25-50 for more accurate null ratios.

  3. Treat truncate recommendations as context-dependent. The optimizer flags fields averaging more than 1,000 characters for truncation. Whether 200 characters is sufficient depends on your LLM task — for classification tasks it often is, but for summarization or extraction tasks you may want to keep more. Use the avgLength value to calibrate.

  4. Use optimizedSchema as a field allowlist, not a deletion list. In your downstream code, use the optimizedSchema array to select only the fields you need rather than trying to delete individual fields. This is more maintainable as the upstream actor schema evolves.

  5. Re-analyze after upstream actor updates. Apify actors are updated regularly. A field like sourceHtml might be added in a new version and silently inflate your token costs. Schedule a monthly re-analysis to detect schema drift.

  6. Combine with B2B Lead Gen Suite for AI enrichment pipelines. If you are feeding enriched lead data into an LLM for scoring or qualification, optimizing the enriched output schema first can cut per-lead LLM costs significantly before the pipeline scales.

  7. Check the error field in the output before processing. If the target actor ID is wrong, has no successful runs, or has an empty dataset, the actor returns a structured error record rather than failing silently. Always check item.error in your consuming code before reading the analysis fields.

Combine with other Apify actors

ActorHow to combine
Website Contact ScraperAnalyze contact scraper output to drop rawHtml and pageContent before feeding contacts to an LLM enrichment step — typical 60%+ savings.
Google Maps Email ExtractorOptimize the business profile output schema before passing records to a lead scoring LLM to reduce cost per scored lead.
Company Deep ResearchDeep research reports contain verbose text fields — use the optimizer to identify which fields to include in LLM prompts versus store separately.
Waterfall Contact EnrichmentEnriched contact records contain many source-specific metadata fields; optimize before passing to a CRM-sync or AI qualification step.
B2B Lead Gen SuiteAudit the full pipeline output schema end-to-end before connecting to an AI lead scorer — maximizes cost efficiency at scale.
Website Content to MarkdownMarkdown conversion actors can include metadata fields; optimize to identify which metadata adds value versus padding tokens.
Trustpilot Review AnalyzerReview data often includes raw review text and structured sentiment fields — use the optimizer to decide which representation is more token-efficient for downstream LLM processing.

Limitations

  • Reads only from the most recent SUCCEEDED run. If the target actor's latest successful run has a schema that differs from its current output format (e.g., after an actor update), the analysis reflects the old schema. Re-run the target actor first to get a fresh dataset.
  • Does not analyze nested objects. The optimizer analyzes top-level fields only. If an actor returns deeply nested objects (e.g., metadata.internal.debugHash), only the top-level field name is analyzed — nested low-value fields inside a kept object are not flagged.
  • Token estimation is approximate. The ~4 characters per token heuristic works well for English prose but overestimates tokens for numeric data and underestimates for non-Latin scripts. Treat savings percentages as directional, not precise.
  • Pattern matching is heuristic, not semantic. A field named status is classified as high-value even if its values are internal status codes with no LLM utility. Review the fieldAnalysis output before applying recommendations blindly.
  • Does not suggest field transformations. The optimizer recommends drop or truncate but does not suggest how to transform values — for example, converting a full address string into structured components, or extracting the domain from a URL. These optimizations require task-specific logic.
  • Requires the target actor to be accessible with your token. The actor uses the APIFY_TOKEN environment variable injected by the Apify platform. The target actor must be owned by the same account or be a public actor. Private actors from other accounts cannot be analyzed.
  • sampleSize above 100 is untested. The Apify dataset API supports large item counts, but very large samples increase run time and memory usage. Keep sampleSize under 100 for reliable performance within the 512 MB memory limit.
  • No diff between schema versions. The optimizer produces a snapshot analysis, not a changelog. It cannot tell you whether a schema has changed since the last analysis — use a dedicated schema monitoring tool for that use case.

Integrations

  • Zapier — trigger an optimization analysis automatically after a target actor run completes, then send the savings report to Slack or email
  • Make — build workflows that analyze actor output schemas on a schedule and log results to a Google Sheet for portfolio tracking
  • Google Sheets — push optimization reports to a spreadsheet to track token savings trends across your actor portfolio over time
  • Apify API — integrate into CI/CD pipelines to validate schema token efficiency before deploying new actor versions to production
  • Webhooks — chain the optimizer as a post-run step after any actor, automatically alerting when a schema change causes token costs to increase
  • LangChain / LlamaIndex — use the optimizedSchema output to dynamically filter actor data before passing to LangChain document loaders or LlamaIndex data connectors

Troubleshooting

  • "No recent runs found" error — The target actor has no SUCCEEDED runs in its run history. Run the target actor at least once with a valid input that produces output, then re-run the optimizer. Note that FAILED or TIMED-OUT runs are not used.

  • "Actor not found" error — The targetActorId value is incorrect. Verify the actor ID by navigating to the actor in the Apify Console — the URL contains the correct slug in username/actor-name format. If copying from an actor's API settings, use the slug format rather than the internal UUID where possible.

  • Output shows 0% savings — All fields in the sample matched high-value patterns or had average lengths under 500 characters. This means the actor's output is already dense and well-structured. Review the fieldAnalysis array to confirm — if all fields are correctly classified as high or medium, no optimization is needed.

  • Very high savings estimate (>90%) — This typically means the actor includes a rawHtml or pageContent field that dominates token cost. Verify the recommendation makes sense for your use case. For some LLM tasks (HTML extraction, element classification), raw HTML may be necessary despite its token cost.

  • Analysis returns fewer fields than expected — The optimizer enumerates fields present in the sample items. If some fields only appear in a subset of records and your sampleSize is too small, those fields may not appear in the analysis. Increase sampleSize to 25-50 to capture sparse fields.

Responsible use

  • This actor only accesses Apify dataset output that belongs to actors authorized under your Apify API token.
  • Do not use this actor to analyze output from actors you do not have explicit permission to access.
  • Token savings recommendations are heuristic; review all drop recommendations before applying them to production pipelines to ensure no business-critical data is discarded.
  • For guidance on responsible AI data pipeline construction, see Apify's platform documentation.

FAQ

How accurate is the LLM token savings estimate? The optimizer uses the ~4 characters per token approximation, which closely matches GPT-4's cl100k_base tokenizer for English prose. For structured data like JSON numbers, URLs, and short strings, actual token counts may differ by 10-20%. Treat the savings percentage as directional — it will consistently identify your highest-cost fields even if the exact numbers vary slightly.

Does LLM Output Optimizer re-run the target actor? No. The optimizer reads from the existing output of the target actor's most recent successful run. It never triggers a new run, never modifies the target actor, and never incurs compute charges on the target actor. The only cost is the $0.35 analysis fee.

How many fields can LLM Output Optimizer analyze in one run? The optimizer handles any number of top-level fields. It enumerates all unique field names across all sampled items, so even sparse schemas with 50+ fields are fully analyzed. The default sampleSize of 10 is sufficient to identify field classifications for most actors.

Can I analyze a private actor that belongs to another user? No. The actor uses the APIFY_TOKEN environment variable provided by the Apify platform. You can only analyze actors that are accessible under your own account — your own actors plus any public actors on the Apify Store.

How is LLM Output Optimizer different from reading the actor's output schema manually? Manual inspection tells you what fields exist but not their token cost, null distribution, or relative information density. The optimizer quantifies token cost per field, ranks fields by cost, computes null ratios across a sample, and generates specific action recommendations with savings percentages — in 10 seconds versus 30-60 minutes of manual analysis.

What types of fields are automatically flagged for removal? Fields whose names contain any of these patterns are classified as low-value and recommended for dropping: _id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent. Additionally, any field (regardless of name) with more than 80% null values is flagged for removal.

Can I schedule LLM Output Optimizer to run periodically? Yes. Use Apify's built-in scheduler to run the optimizer weekly or monthly against your key actors. This detects schema drift — cases where an actor update adds new verbose fields that inflate your LLM costs. The structured output makes it straightforward to track savings trends over time.

What happens if the target actor has an empty dataset? The optimizer returns a structured error record: {"error": "Latest run produced an empty dataset. Nothing to optimize."}. This can happen if the actor ran successfully but produced no output items — for example, if the search returned zero results. Re-run the target actor with an input that produces data before analyzing.

How is this different from just filtering fields in my code? You can absolutely filter fields manually — but you first need to know which fields are worth filtering and how much each one costs. The optimizer answers those questions. Think of it as a profiler for your LLM data pipeline: it tells you where the token budget is going so you can make informed decisions rather than guessing.

Is it legal to analyze actor output data with this tool? Yes. The optimizer analyzes output data from actors running under your own Apify account. You are reading your own data. No external websites are accessed during the analysis. For guidance on the legality of the data collected by the target actors themselves, see Apify's guide on web scraping legality.

How long does a typical LLM optimization analysis take? Most analyses complete in 5-15 seconds. The actor makes three sequential API calls (actor lookup, runs lookup, dataset fetch) plus local computation. The dataset fetch time is the main variable — larger sampleSize values or very wide schemas (50+ fields) take slightly longer.

Can I use the optimizedSchema output directly in my LangChain or LlamaIndex pipeline? Yes. The optimizedSchema field is a plain JSON array of field name strings. You can use it directly as a field allowlist when constructing Document objects in LangChain or as a metadata filter in LlamaIndex. See the Apify LangChain integration docs for connection examples.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

How it works

01

Configure

Set your parameters in the Apify Console or pass them via API.

02

Run

Click Start, trigger via API, webhook, or set up a schedule.

03

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Ready to try LLM Output Optimizer?

Start for free on Apify. No credit card required.

Open on Apify Store