DEVELOPER TOOLSAI

LLM Output Optimizer

Optimize actor titles, descriptions, and README content for AI/LLM discovery. Analyses how ChatGPT, Claude, and Perplexity interpret your actor listing.

Try on Apify Store

$0.25per event

Users (30d)

Runs (30d)

Actively maintained

Maintenance Pulse

$0.25

Per event

Maintenance Pulse

90/100

Last Build

Today

Last Version

1d ago

Builds (30d)

Issue Response

N/A

Cost Estimate

How many results do you need?

llm-optimizations

Estimated cost:$25.00

Pricing

Pay Per Event model. You only pay for what you use.

Event	Description	Price
llm-optimization	Charged per optimization analysis.	$0.25

Example: 100 events = $25.00 · 1,000 events = $250.00

Documentation

LLM Output Optimizer analyzes any Apify actor's output schema and tells you exactly which fields to keep, drop, or truncate before feeding data into an LLM pipeline. It reads from the actor's most recent successful run, scores every field by information density, and produces a token-reduction report in seconds. Typical savings range from 40% to 70% of your LLM token budget.

The actor fetches a configurable sample of output items from the target actor's latest dataset, runs per-field token estimation using a character-based model (~4 characters per token for English text), classifies each field as high-value, medium-value, or low-value, and generates a recommended optimized schema. No re-running the target actor is required — the analysis reads from existing output data.

What data can you extract?

Data Point	Source	Example
Original token count	Full output analysis	`4,200 tokens`
Optimized token count	After applying recommendations	`1,680 tokens`
Savings percentage	Calculated reduction	`60%`
Field name	Per-field analysis	`rawHtml`
Field value classification	Pattern + length heuristics	`low` / `medium` / `high`
Recommended action	Per field	`drop` / `keep` / `truncate`
Null ratio	Missing value rate	`0.83`
Average field length	Characters per value	`4,500`
Token cost per field	Estimated from character count	`2,800 tokens`
Optimized schema	Recommended field list	`["url", "emails", "phones", "name"]`
Recommendations	Actionable suggestions with savings	`Drop 3 low-value fields — saves 2,520 tokens (60%)`
Analysis timestamp	ISO 8601	`2026-03-20T14:32:00.000Z`

Why use LLM Output Optimizer?

Most Apify actors return far more data than an LLM actually needs. A web scraper might return rawHtml, sourceHtml, pageContent, internal timestamps, debug hashes, and a dozen other fields that consume thousands of tokens per record — at real cost. If you are passing actor output to GPT-4, Claude, or Gemini, you are likely paying for 3-5x more tokens than your prompts require.

Manually auditing an output schema requires pulling a dataset, inspecting field distributions, estimating token costs, and writing custom field filters. For a 30-field schema across 10 sample records, this takes 30-60 minutes per actor. This actor does it in under 10 seconds for $0.35.

Scheduling — schedule weekly re-analysis to catch schema drift as actors are updated
API access — trigger from Python, JavaScript, or any HTTP client as part of your pipeline build process
Proxy rotation — not required; the actor calls the Apify REST API directly using your token
Monitoring — get Slack or email alerts when the optimization run fails or a target actor produces empty output
Integrations — connect results to Zapier, Make, or Google Sheets to track token savings across your full actor portfolio

Features

Character-based token estimation — approximates token count at ~4 characters per token, matching GPT-4 tokenizer behavior for English text without requiring a tokenizer library
15+ low-value field pattern matching — automatically flags fields matching patterns including _id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, and pageContent
11+ high-value field pattern matching — protects fields matching name, title, url, email, phone, price, rating, address, description, summary, category, and status from being dropped
Null ratio analysis — computes the proportion of null or missing values per field; fields with >80% null values are automatically flagged for removal
Long-field truncation detection — fields averaging more than 1,000 characters are flagged as truncation candidates, with optimized token cost estimated at 20% of original (equivalent to a 200-character truncation)
Token cost sorted output — field analysis is sorted by token cost descending so the highest-impact optimizations appear first
Optimized schema generation — produces a final optimizedSchema field list containing only fields recommended as keep or truncate
Quantified recommendations — each recommendation includes exact token counts and percentage savings, not just qualitative advice
Configurable sample size — analyze 5 to 100 records depending on how representative you need the sample to be
Read-only analysis — never modifies the target actor, its settings, or its output; reads only from existing dataset items

Use cases for LLM output optimization

AI pipeline cost reduction

Teams building RAG systems, AI agents, or document Q&A pipelines on top of Apify scrapers need to minimize token throughput. Before connecting a scraper to GPT-4o or Claude 3.5, run the optimizer to identify which fields are safe to drop. A single optimization pass on a 40-field scraper output can cut per-record token cost from 3,000 to 800 tokens — reducing downstream LLM costs by 70% across millions of records.

Actor schema auditing before production

Developers preparing an actor for production use can use this tool to audit the output schema for unnecessary bloat. A quick analysis reveals whether fields like pageContent, rawHtml, or internal debug fields made it into the output — fields that add token cost with no downstream value in most AI workflows.

LLM-ready dataset preparation

Data teams building fine-tuning datasets or evaluation benchmarks from scraped data need clean, dense schemas. The optimizer identifies sparse fields (high null ratios) and verbose fields (long raw content) that should be excluded from training examples to keep dataset quality high and token costs low.

Multi-actor pipeline optimization

When chaining multiple actors — for example, a Google Maps scraper feeding into a contact enrichment actor feeding into an LLM summarizer — each stage multiplies token costs. Running the optimizer on each actor in the chain surfaces the cumulative savings opportunity before the pipeline goes live.

Actor portfolio token budgeting

Developers managing a portfolio of 10+ actors and running LLM workflows on their combined output can use the optimizer to benchmark token efficiency across the portfolio. The structured output makes it straightforward to compare token density scores and prioritize which actors need schema cleanup.

How to optimize actor output for LLM pipelines

Find your target actor ID — Go to the Apify Console, open the actor you want to analyze, and copy the actor ID from the URL or Settings tab. It looks like ryanclinton/website-contact-scraper or a numeric ID like BHzefUZlZRKWxkTch. The actor must have at least one successful run with output data.
Configure the sample size — The default of 10 items is suitable for most actors. Increase to 25-50 if the actor has a large schema with many sparse fields, to get a more representative null ratio estimate.
Run the actor — Click "Start" and wait. Most analyses complete in 5-15 seconds. The actor fetches the schema from the target actor's most recent successful run — it does not re-execute the target actor.
Review the report — Download the JSON result from the Dataset tab. The fieldAnalysis array is sorted by token cost descending — start at the top. Apply the optimizedSchema as a field allowlist in your LLM pipeline or use the recommendations array to guide manual schema cleanup.

Input parameters

Parameter	Type	Required	Default	Description
`targetActorId`	string	Yes	—	Apify actor ID to analyze. Accepts `username/actor-name` format (e.g., `ryanclinton/website-contact-scraper`) or a numeric actor ID. The actor must have at least one SUCCEEDED run.
`sampleSize`	integer	No	`10`	Number of output items to fetch from the target actor's latest dataset for analysis. Higher values give more accurate null ratios. Recommended range: 5–50.

Input examples

Analyze a scraper before connecting to an LLM:

{
  "targetActorId": "ryanclinton/website-contact-scraper"
}

Larger sample for a sparse schema:

{
  "targetActorId": "ryanclinton/google-maps-email-extractor",
  "sampleSize": 25
}

Minimal run using numeric actor ID:

{
  "targetActorId": "BHzefUZlZRKWxkTch",
  "sampleSize": 5
}

Input tips

Run the target actor first — the optimizer reads from the most recent SUCCEEDED run. If no successful run exists, the actor will return an error with a clear message.
Use the default sample size to start — 10 items captures the schema shape and null patterns for most actors. Only increase if you have reason to believe field sparsity varies significantly across records.
Prefer username/actor-name format — it is more readable and easier to verify than a numeric ID, and both formats are supported.
Re-run after actor updates — actor schemas change when actors are updated. A monthly re-analysis detects new low-value fields introduced by upstream changes.

Output example

{
  "actorName": "ryanclinton/website-contact-scraper",
  "actorId": "ryanclinton/website-contact-scraper",
  "sampleSize": 10,
  "originalTokens": 4200,
  "optimizedTokens": 1512,
  "savingsPercent": 64,
  "fieldAnalysis": [
    {
      "field": "rawHtml",
      "tokens": 2800,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 4500
    },
    {
      "field": "pageContent",
      "tokens": 520,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 840
    },
    {
      "field": "description",
      "tokens": 180,
      "value": "high",
      "action": "truncate",
      "nullRatio": 0.1,
      "avgLength": 1240
    },
    {
      "field": "emails",
      "tokens": 120,
      "value": "high",
      "action": "keep",
      "nullRatio": 0.2,
      "avgLength": 180
    },
    {
      "field": "url",
      "tokens": 45,
      "value": "high",
      "action": "keep",
      "nullRatio": 0,
      "avgLength": 65
    },
    {
      "field": "phones",
      "tokens": 38,
      "value": "high",
      "action": "keep",
      "nullRatio": 0.3,
      "avgLength": 55
    },
    {
      "field": "domain",
      "tokens": 22,
      "value": "medium",
      "action": "keep",
      "nullRatio": 0,
      "avgLength": 30
    },
    {
      "field": "scrapedAt",
      "tokens": 18,
      "value": "low",
      "action": "drop",
      "nullRatio": 0,
      "avgLength": 24
    }
  ],
  "optimizedSchema": ["description", "emails", "url", "phones", "domain"],
  "recommendations": [
    "Drop 3 low-value fields — saves 3,338 tokens (79%)",
    "Truncate 1 long fields to 200 chars — reduces token count significantly"
  ],
  "analyzedAt": "2026-03-20T14:32:00.000Z"
}

Output fields

Field	Type	Description
`actorName`	string	Display name of the analyzed actor in `username/actorName` format
`actorId`	string	The actor ID provided as input
`sampleSize`	integer	Number of items analyzed from the dataset
`originalTokens`	integer	Total estimated token count across all fields in the sample
`optimizedTokens`	integer	Estimated token count after applying recommended drops and truncations
`savingsPercent`	integer	Percentage reduction from original to optimized token count
`fieldAnalysis`	array	Per-field analysis objects, sorted by token cost descending
`fieldAnalysis[].field`	string	Field name from the actor output schema
`fieldAnalysis[].tokens`	integer	Estimated token cost for this field across all sample items
`fieldAnalysis[].value`	string	Classification: `high`, `medium`, or `low`
`fieldAnalysis[].action`	string	Recommendation: `keep`, `drop`, or `truncate`
`fieldAnalysis[].nullRatio`	float	Proportion of items where this field is null or missing (0.0–1.0)
`fieldAnalysis[].avgLength`	integer	Average character length of values for this field
`optimizedSchema`	array	List of field names recommended to keep (keep + truncate fields only)
`recommendations`	array	Human-readable action items with quantified token savings
`analyzedAt`	string	ISO 8601 timestamp of when the analysis was completed
`error`	string	Present only on failure; describes why the analysis could not complete

How much does it cost to optimize actor output for LLMs?

LLM Output Optimizer uses pay-per-event pricing — you pay $0.35 per analysis. Platform compute costs are included.

Scenario	Analyses	Cost per analysis	Total cost
Quick test	1	$0.35	$0.35
Audit 5 actors	5	$0.35	$1.75
Audit 20 actors	20	$0.35	$7.00
Full portfolio review	50	$0.35	$17.50
Continuous monitoring (monthly)	100	$0.35	$35.00

You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.

A single analysis costs less than a minute of GPT-4o inference time. For a 30-field actor producing 60% savings, the $0.35 cost pays back on the very first LLM call using the optimized schema. Apify's free tier includes $5 of monthly credits — enough for 14 analyses at no charge.

Optimize actor output using the API

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/actor-llm-optimizer").call(run_input={
    "targetActorId": "ryanclinton/website-contact-scraper",
    "sampleSize": 10
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"Actor: {item['actorName']}")
    print(f"Token savings: {item['savingsPercent']}% ({item['originalTokens']} → {item['optimizedTokens']} tokens)")
    print(f"Optimized schema: {item['optimizedSchema']}")
    for rec in item.get("recommendations", []):
        print(f"  - {rec}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/actor-llm-optimizer").call({
    targetActorId: "ryanclinton/website-contact-scraper",
    sampleSize: 10
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`Actor: ${item.actorName}`);
    console.log(`Token savings: ${item.savingsPercent}% (${item.originalTokens} → ${item.optimizedTokens} tokens)`);
    console.log(`Optimized schema: ${JSON.stringify(item.optimizedSchema)}`);
    item.recommendations?.forEach(rec => console.log(`  - ${rec}`));
}

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-llm-optimizer/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"targetActorId": "ryanclinton/website-contact-scraper", "sampleSize": 10}'

# Fetch results (replace DATASET_ID with the defaultDatasetId from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How LLM Output Optimizer works

Phase 1: Target actor and dataset resolution

The actor accepts a targetActorId in either username/actor-name or numeric ID format. It normalizes the ID by converting the / separator to ~ for Apify REST API URL compatibility. It then calls GET /v2/acts/{actorId} to verify the actor exists and retrieve its display name. Next, it queries GET /v2/acts/{actorId}/runs with limit=1, desc=true, and status=SUCCEEDED to retrieve the most recent successful run. The dataset ID is extracted from runs.items[0].defaultDatasetId.

Phase 2: Per-field classification and token estimation

The actor fetches up to sampleSize items from the dataset using GET /v2/datasets/{datasetId}/items. It enumerates all unique field names across all sample items (not just the first record, to handle sparse schemas). For each field, it computes three metrics: (1) estimated token cost by joining all values as JSON strings and dividing total character count by 4; (2) null ratio as the proportion of items where the field is absent or null; (3) average character length per non-null value.

Each field is then classified using a two-step pattern matching approach. If the lowercase field name contains any of 15 low-value patterns (_id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent), the field is classified as low. If it matches any of 11 high-value patterns (name, title, url, email, phone, price, rating, address, description, summary, category, status), it is classified as high. Otherwise, if the average serialized value length exceeds 500 characters, the field is classified as low; otherwise medium.

Phase 3: Action assignment and savings calculation

Each field receives one of three recommended actions. Fields classified as low are assigned drop. Fields with a null ratio above 0.80 are assigned drop regardless of classification. Fields with average character length above 1,000 characters receive truncate (with optimized token cost estimated at 20% of original, representing a ~200-character truncation). All other fields receive keep.

The optimizedTokens count sums token costs for all keep fields plus 20% of token costs for truncate fields. The savings percentage is (1 - optimizedTokens / originalTokens) * 100. The fieldAnalysis array is sorted by token cost descending so the highest-impact fields appear first. Three categories of recommendations are generated where applicable: a drop recommendation with aggregate token savings and percentage, a truncation recommendation, and a high-null-rate warning for fields with >50% null values that were not already dropped.

Phase 4: Output and PPE charge

The full report is pushed to the Apify dataset in a single record. If the actor is running under pay-per-event pricing, a single llm-optimization event is charged at $0.35. The charge call checks eventChargeLimitReached and logs a warning if the spending limit was hit before the charge completed.

Tips for best results

Run the target actor at least once before optimizing. The optimizer reads from the most recent SUCCEEDED run. If you just deployed an actor and have not run it yet, run it with a representative input first so the output schema is populated.
Increase sampleSize for sparse schemas. If an actor returns data where many fields are only populated for specific record types (e.g., a scraper that extracts different data from different page types), a sample of 5-10 may not capture the full null distribution. Use sampleSize 25-50 for more accurate null ratios.
Treat truncate recommendations as context-dependent. The optimizer flags fields averaging more than 1,000 characters for truncation. Whether 200 characters is sufficient depends on your LLM task — for classification tasks it often is, but for summarization or extraction tasks you may want to keep more. Use the avgLength value to calibrate.
Use optimizedSchema as a field allowlist, not a deletion list. In your downstream code, use the optimizedSchema array to select only the fields you need rather than trying to delete individual fields. This is more maintainable as the upstream actor schema evolves.
Re-analyze after upstream actor updates. Apify actors are updated regularly. A field like sourceHtml might be added in a new version and silently inflate your token costs. Schedule a monthly re-analysis to detect schema drift.
Combine with B2B Lead Gen Suite for AI enrichment pipelines. If you are feeding enriched lead data into an LLM for scoring or qualification, optimizing the enriched output schema first can cut per-lead LLM costs significantly before the pipeline scales.
Check the error field in the output before processing. If the target actor ID is wrong, has no successful runs, or has an empty dataset, the actor returns a structured error record rather than failing silently. Always check item.error in your consuming code before reading the analysis fields.

Combine with other Apify actors

Actor	How to combine
Website Contact Scraper	Analyze contact scraper output to drop `rawHtml` and `pageContent` before feeding contacts to an LLM enrichment step — typical 60%+ savings.
Google Maps Email Extractor	Optimize the business profile output schema before passing records to a lead scoring LLM to reduce cost per scored lead.
Company Deep Research	Deep research reports contain verbose text fields — use the optimizer to identify which fields to include in LLM prompts versus store separately.
Waterfall Contact Enrichment	Enriched contact records contain many source-specific metadata fields; optimize before passing to a CRM-sync or AI qualification step.
B2B Lead Gen Suite	Audit the full pipeline output schema end-to-end before connecting to an AI lead scorer — maximizes cost efficiency at scale.
Website Content to Markdown	Markdown conversion actors can include metadata fields; optimize to identify which metadata adds value versus padding tokens.
Trustpilot Review Analyzer	Review data often includes raw review text and structured sentiment fields — use the optimizer to decide which representation is more token-efficient for downstream LLM processing.

Limitations

Reads only from the most recent SUCCEEDED run. If the target actor's latest successful run has a schema that differs from its current output format (e.g., after an actor update), the analysis reflects the old schema. Re-run the target actor first to get a fresh dataset.
Does not analyze nested objects. The optimizer analyzes top-level fields only. If an actor returns deeply nested objects (e.g., metadata.internal.debugHash), only the top-level field name is analyzed — nested low-value fields inside a kept object are not flagged.
Token estimation is approximate. The ~4 characters per token heuristic works well for English prose but overestimates tokens for numeric data and underestimates for non-Latin scripts. Treat savings percentages as directional, not precise.
Pattern matching is heuristic, not semantic. A field named status is classified as high-value even if its values are internal status codes with no LLM utility. Review the fieldAnalysis output before applying recommendations blindly.
Does not suggest field transformations. The optimizer recommends drop or truncate but does not suggest how to transform values — for example, converting a full address string into structured components, or extracting the domain from a URL. These optimizations require task-specific logic.
Requires the target actor to be accessible with your token. The actor uses the APIFY_TOKEN environment variable injected by the Apify platform. The target actor must be owned by the same account or be a public actor. Private actors from other accounts cannot be analyzed.
sampleSize above 100 is untested. The Apify dataset API supports large item counts, but very large samples increase run time and memory usage. Keep sampleSize under 100 for reliable performance within the 512 MB memory limit.
No diff between schema versions. The optimizer produces a snapshot analysis, not a changelog. It cannot tell you whether a schema has changed since the last analysis — use a dedicated schema monitoring tool for that use case.

Integrations

Zapier — trigger an optimization analysis automatically after a target actor run completes, then send the savings report to Slack or email
Make — build workflows that analyze actor output schemas on a schedule and log results to a Google Sheet for portfolio tracking
Google Sheets — push optimization reports to a spreadsheet to track token savings trends across your actor portfolio over time
Apify API — integrate into CI/CD pipelines to validate schema token efficiency before deploying new actor versions to production
Webhooks — chain the optimizer as a post-run step after any actor, automatically alerting when a schema change causes token costs to increase
LangChain / LlamaIndex — use the optimizedSchema output to dynamically filter actor data before passing to LangChain document loaders or LlamaIndex data connectors

Troubleshooting

"No recent runs found" error — The target actor has no SUCCEEDED runs in its run history. Run the target actor at least once with a valid input that produces output, then re-run the optimizer. Note that FAILED or TIMED-OUT runs are not used.
"Actor not found" error — The targetActorId value is incorrect. Verify the actor ID by navigating to the actor in the Apify Console — the URL contains the correct slug in username/actor-name format. If copying from an actor's API settings, use the slug format rather than the internal UUID where possible.
Output shows 0% savings — All fields in the sample matched high-value patterns or had average lengths under 500 characters. This means the actor's output is already dense and well-structured. Review the fieldAnalysis array to confirm — if all fields are correctly classified as high or medium, no optimization is needed.
Very high savings estimate (>90%) — This typically means the actor includes a rawHtml or pageContent field that dominates token cost. Verify the recommendation makes sense for your use case. For some LLM tasks (HTML extraction, element classification), raw HTML may be necessary despite its token cost.
Analysis returns fewer fields than expected — The optimizer enumerates fields present in the sample items. If some fields only appear in a subset of records and your sampleSize is too small, those fields may not appear in the analysis. Increase sampleSize to 25-50 to capture sparse fields.

Responsible use

This actor only accesses Apify dataset output that belongs to actors authorized under your Apify API token.
Do not use this actor to analyze output from actors you do not have explicit permission to access.
Token savings recommendations are heuristic; review all drop recommendations before applying them to production pipelines to ensure no business-critical data is discarded.
For guidance on responsible AI data pipeline construction, see Apify's platform documentation.

FAQ

How accurate is the LLM token savings estimate? The optimizer uses the ~4 characters per token approximation, which closely matches GPT-4's cl100k_base tokenizer for English prose. For structured data like JSON numbers, URLs, and short strings, actual token counts may differ by 10-20%. Treat the savings percentage as directional — it will consistently identify your highest-cost fields even if the exact numbers vary slightly.

Does LLM Output Optimizer re-run the target actor? No. The optimizer reads from the existing output of the target actor's most recent successful run. It never triggers a new run, never modifies the target actor, and never incurs compute charges on the target actor. The only cost is the $0.35 analysis fee.

How many fields can LLM Output Optimizer analyze in one run? The optimizer handles any number of top-level fields. It enumerates all unique field names across all sampled items, so even sparse schemas with 50+ fields are fully analyzed. The default sampleSize of 10 is sufficient to identify field classifications for most actors.

Can I analyze a private actor that belongs to another user? No. The actor uses the APIFY_TOKEN environment variable provided by the Apify platform. You can only analyze actors that are accessible under your own account — your own actors plus any public actors on the Apify Store.

How is LLM Output Optimizer different from reading the actor's output schema manually? Manual inspection tells you what fields exist but not their token cost, null distribution, or relative information density. The optimizer quantifies token cost per field, ranks fields by cost, computes null ratios across a sample, and generates specific action recommendations with savings percentages — in 10 seconds versus 30-60 minutes of manual analysis.

What types of fields are automatically flagged for removal? Fields whose names contain any of these patterns are classified as low-value and recommended for dropping: _id, _at, timestamp, scraped, crawled, hash, checksum, internal, debug, raw, html, rawHtml, sourceHtml, pageContent. Additionally, any field (regardless of name) with more than 80% null values is flagged for removal.

Can I schedule LLM Output Optimizer to run periodically? Yes. Use Apify's built-in scheduler to run the optimizer weekly or monthly against your key actors. This detects schema drift — cases where an actor update adds new verbose fields that inflate your LLM costs. The structured output makes it straightforward to track savings trends over time.

What happens if the target actor has an empty dataset? The optimizer returns a structured error record: {"error": "Latest run produced an empty dataset. Nothing to optimize."}. This can happen if the actor ran successfully but produced no output items — for example, if the search returned zero results. Re-run the target actor with an input that produces data before analyzing.

How is this different from just filtering fields in my code? You can absolutely filter fields manually — but you first need to know which fields are worth filtering and how much each one costs. The optimizer answers those questions. Think of it as a profiler for your LLM data pipeline: it tells you where the token budget is going so you can make informed decisions rather than guessing.

Is it legal to analyze actor output data with this tool? Yes. The optimizer analyzes output data from actors running under your own Apify account. You are reading your own data. No external websites are accessed during the analysis. For guidance on the legality of the data collected by the target actors themselves, see Apify's guide on web scraping legality.

How long does a typical LLM optimization analysis take? Most analyses complete in 5-15 seconds. The actor makes three sequential API calls (actor lookup, runs lookup, dataset fetch) plus local computation. The dataset fetch time is the main variable — larger sampleSize values or very wide schemas (50+ fields) take slightly longer.

Can I use the optimizedSchema output directly in my LangChain or LlamaIndex pipeline? Yes. The optimizedSchema field is a plain JSON array of field name strings. You can use it directly as a field allowlist when constructing Document objects in LangChain or as a metadata filter in LlamaIndex. See the Apify LangChain integration docs for connection examples.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

Go to Account Settings > Privacy
Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

How it works

Configure

Set your parameters in the Apify Console or pass them via API.

Run

Click Start, trigger via API, webhook, or set up a schedule.

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Related actors

Bulk Email Verifier

Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.

100% success$0.005/event

GitHub Repository Search

Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.

99% success

Website Content to Markdown

Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.

100% success$0.02/event

Website Tech Stack Detector

Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.

100% success$0.10/event

Not sure which actor to pick?

Try the actor recommender

Ready to try LLM Output Optimizer?

Start for free on Apify. No credit card required.

Open on Apify Store