Csv Lead Processor
**CSV lead processing** for sales teams and agencies who already have a list — Apollo exports, LinkedIn Sales Navigator CSVs, trade show downloads, or any other lead file. Upload your CSV, map your column headers to a standard format, and receive a clean, enriched, verified dataset ready for your CRM or outreach tool. No coding required.
Maintenance Pulse
90/100Documentation
CSV lead processing for sales teams and agencies who already have a list — Apollo exports, LinkedIn Sales Navigator CSVs, trade show downloads, or any other lead file. Upload your CSV, map your column headers to a standard format, and receive a clean, enriched, verified dataset ready for your CRM or outreach tool. No coding required.
The actor handles the full data pipeline in one run: it parses your file, normalizes field values, deduplicates by company domain, fills in missing email addresses by scraping company websites and detecting email format patterns, verifies deliverability against live mail servers, and exports an enriched CSV back to you. What takes a half-day of manual work in Excel and Hunter.io is done in minutes.
What data can you extract?
| Data point | Source | Example |
|---|---|---|
| 📋 Company name | CSV column | Meridian Software Group |
| 🌐 Website | CSV column or derived from domain | https://meridiansoftware.io |
| 🔗 Domain | Extracted from website URL | meridiansoftware.io |
| 👤 Contact first name | CSV column | Danielle |
| 👤 Contact last name | CSV column | Okafor |
| 📧 Primary email | CSV, scraper, or pattern generator | [email protected] |
| 📞 Phone number | CSV column | +1 (415) 882-3301 |
| 💼 Job title | CSV column | VP of Engineering |
| 🔗 LinkedIn URL | CSV column (normalized) | https://linkedin.com/company/meridian-software-group |
| 🏢 Industry | CSV column | Enterprise Software |
| 👥 Employee count | CSV column | 51-200 |
| 📍 City / State / Country | CSV column | Austin / TX / United States |
| 📧 Enriched emails | Website contact scraper | [email protected]; [email protected] |
| 🔍 Email pattern | Pattern finder sub-actor | {first}.{last}@meridiansoftware.io |
| ✅ Email verified | Bulk email verifier (MX + SMTP) | true |
| 📊 Email status | Bulk email verifier | valid |
| 📊 Email confidence | Bulk email verifier (0-100) | 94 |
Why use CSV Lead Processor?
Your lead list is only as good as the contact data inside it. A typical Apollo or LinkedIn export has incomplete email coverage — anywhere from 30-60% of rows may be missing a deliverable email address. Cleaning that by hand means opening each company website, guessing email formats, validating in a separate tool, and re-importing to your CRM. A list of 500 leads can easily consume a full workday.
This actor automates the entire pipeline. Feed it any CSV, tell it which column is "Company Name" and which is "Email", and it handles the rest: normalizing inconsistent formatting, removing duplicate companies, calling a website scraper for missing emails, falling back to email pattern detection when the scraper finds nothing, and running every email through live mail server verification. You get back a single enriched file with confidence scores you can act on immediately.
Beyond the data itself, the Apify platform adds:
- Scheduling — run weekly on a refreshed export to keep enrichment current as your pipeline grows
- API access — trigger runs from Python, JavaScript, or any HTTP client and pull results programmatically
- Proxy infrastructure — the website contact scraper sub-actor uses Apify's residential proxy pool, reducing block rates on company sites
- Monitoring — configure Slack or email alerts when a run fails or processes fewer rows than expected
- Integrations — connect output to Zapier, Make, Google Sheets, HubSpot, or any webhook destination
Features
- Flexible CSV ingestion — accepts files via public URL or base64-encoded payload; supports comma, semicolon, and tab (TSV) delimiters with automatic UTF-8 BOM stripping so Excel exports parse cleanly
- Case-insensitive column mapping — map any header name to a canonical field; headers like "Company Name", "company name", and "COMPANY NAME" all match without configuration changes
- Domain extraction and normalization — automatically strips scheme (
https://),www.prefix, and trailing paths to produce a clean domain string; deriveswebsitefromdomainand vice versa when only one is present - Email format validation — discards any email value that fails the basic
[email protected]pattern before processing; malformed values from source files never propagate downstream - LinkedIn URL normalization — bare handles (e.g.
meridian-software-group) are expanded to fullhttps://linkedin.com/company/URLs automatically - Domain-level deduplication — keeps only the first row per company domain by default, preventing the same company from being enriched and verified multiple times
- Two-stage email enrichment — first calls Website Contact Scraper (up to 3 pages per domain) on all domains lacking an email; any domain still without an email then goes to Email Pattern Finder for format detection and name-based generation
- Batch sub-actor calls — all domains without emails are sent in a single batch to each sub-actor, not one call per row; this makes enrichment dramatically faster and cheaper for large files
- Pattern-based email generation — when a format like
{first}.{last}@domain.comis detected with confidence, it is applied to the contact's name to produce a candidate email - Bulk email verification — calls Bulk Email Verifier with DNS MX checks and SMTP probing; results include
emailStatus(valid,risky,invalid,unknown,disposable) and a 0-100 confidence score - Downloadable output CSV — writes a UTF-8 with BOM CSV to the actor's Key-Value Store and returns the direct download URL in the summary record; ready to import into any CRM
- Configurable output columns — choose which canonical fields appear in the output CSV and in what order; enrichment columns are always appended
- Spending limit enforcement — the actor checks the pay-per-event limit after each row push and stops cleanly if your budget cap is reached, so you are never charged more than you authorize
- Row cap for testing — set
maxRowsto process a subset of a large file before committing to a full run - Streaming CSV parser — uses an async generator over
csv-parsestream so even multi-megabyte files are processed without memory pressure; 512 MB memory allocation handles files up to tens of thousands of rows
Use cases for CSV lead processing
Sales prospecting list cleanup
Sales development reps at B2B companies frequently export from Apollo, ZoomInfo, or LinkedIn Sales Navigator and find that 40% of rows have no email, and another 20% have emails that bounce. Running those exports through this actor before loading into Outreach, Salesloft, or HubSpot means the sequences your team sets up actually reach real inboxes. A 500-row export cleaned and enriched here typically surfaces 150-200 net-new deliverable contacts.
Marketing agency client deliverables
Agencies managing lead generation campaigns for multiple clients receive raw lists from various sources — event registrations, trade show scans, inbound form submissions. This actor standardizes any format into a consistent schema, enriches missing contact data, and returns a verified CSV the client can load directly into their CRM. One actor run replaces a multi-tool workflow involving manual Excel cleanup, Hunter.io searches, and NeverBounce verification.
Recruiting and talent sourcing outreach
Recruiters working from company target lists exported from LinkedIn often have company names and websites but no individual contact emails. The enrichment step scrapes company websites for HR and talent acquisition contacts, and the pattern finder generates candidate emails from the recruiter's target contact names. The verified results feed directly into recruiting outreach sequences.
Data enrichment for event-sourced leads
Trade show badge scans and conference registration exports typically include name, company, and phone, but rarely email. This actor's enrichment pipeline fills those gaps systematically: the website scraper visits the company site, and the pattern finder generates email candidates using the contact name. Combined with verification, you know which generated emails are safe to send before the event follow-up window closes.
CRM hygiene and re-engagement campaigns
Existing CRM records go stale. Companies change domains, contacts move roles, and email addresses churn. Exporting a segment of cold or bounced contacts as a CSV and running it through this actor with verification enabled quickly identifies which records still have valid delivery paths and which need to be marked as undeliverable, without touching your CRM's automation triggers.
Lead list arbitrage and resale
Data brokers and list vendors who buy raw contact data for resale use this actor to standardize inconsistent formats from multiple sources into a single canonical schema, deduplicate by domain, and append verification scores before packaging. The output CSV is formatted for direct import by the end customer.
How to process and enrich a CSV lead list
-
Upload your CSV — paste a public URL (Google Drive shareable link, S3 signed URL, Dropbox direct link) into the
csvUrlfield, or base64-encode your file and paste the encoded string intocsvBase64. No file size limit is enforced by the actor. -
Set your column mapping — in the
columnMappingfield, add one entry per column you want to keep. The key is your CSV header exactly as it appears (case-insensitive matching is applied automatically), and the value is the canonical field name. For an Apollo export:"Company": "companyName","Website": "website","First Name": "firstName","Last Name": "lastName","Email": "email","Title": "title". Any unmapped columns are ignored. -
Choose enrichment and verification — check
enrichEmailsto automatically find emails for rows that have a website but no email address. CheckverifyEmailsto run all emails through DNS MX and SMTP probing. Either option can be used independently. -
Click Start and wait — a 100-row CSV with no enrichment takes under 30 seconds. Enabling enrichment for 100 domains adds roughly 2-5 minutes. Enabling verification adds 1-3 minutes. Download your enriched CSV from the Key-Value Store URL in the summary record, or pull the structured dataset directly from the Dataset tab.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
csvUrl | string | One of csvUrl / csvBase64 required | — | Public URL of the CSV file to download. Supports HTTP and HTTPS. Retried up to 3 times with exponential backoff on transient failures. |
csvBase64 | string | One of csvUrl / csvBase64 required | — | Base64-encoded CSV content. Use when you cannot provide a public URL (e.g. uploading directly from a script). |
columnMapping | object | Required | See default | JSON object mapping CSV header strings to canonical field names. Keys are your CSV column headers (case-insensitive). Values must be one of the 17 canonical field names listed below. |
enrichEmails | boolean | No | false | When true, rows with a domain but no email are enriched via Website Contact Scraper then Email Pattern Finder. All domains are sent in a single batch call. |
verifyEmails | boolean | No | false | When true, all emails (from CSV and enrichment) are verified via Bulk Email Verifier using DNS MX + SMTP probing. All unique emails are sent in a single batch. |
deduplicateByDomain | boolean | No | true | Keep only the first row per company domain. Requires website or domain to be mapped. |
outputCsv | boolean | No | true | Write an enriched CSV (UTF-8 with BOM) to the Key-Value Store and return the download URL in the summary record. |
outputColumns | array | No | [] (all) | Canonical field names to include in the output CSV, in the order listed. Enrichment columns are always appended. Leave empty to include all 17 canonical fields. |
csvDelimiter | string | No | , | Field delimiter for both input and output. Options: , (comma), ; (semicolon for European locales), \t (tab / TSV). |
maxRows | integer | No | 0 (unlimited) | Stop after processing this many rows. Set to a small number (e.g. 10) to test a large file before a full run. |
Canonical field names (valid values for columnMapping): companyName, website, domain, firstName, lastName, fullName, email, phone, title, linkedinUrl, industry, employeeCount, city, state, country, description, tags.
Input examples
Apollo export with enrichment and verification:
{
"csvUrl": "https://storage.googleapis.com/my-bucket/apollo-export-2024-q1.csv",
"columnMapping": {
"Company": "companyName",
"Website": "website",
"First Name": "firstName",
"Last Name": "lastName",
"Email": "email",
"Title": "title",
"Industry": "industry",
"Employees": "employeeCount",
"City": "city",
"Country": "country"
},
"enrichEmails": true,
"verifyEmails": true,
"deduplicateByDomain": true,
"outputCsv": true
}
LinkedIn Sales Navigator export (company list, no emails):
{
"csvUrl": "https://storage.googleapis.com/my-bucket/linkedin-companies.csv",
"columnMapping": {
"Company Name": "companyName",
"Website": "website",
"Industry": "industry",
"Company Size": "employeeCount",
"Headquarters": "city",
"Country": "country"
},
"enrichEmails": true,
"verifyEmails": false,
"deduplicateByDomain": true
}
Quick test on first 10 rows only:
{
"csvUrl": "https://storage.googleapis.com/my-bucket/large-list-5000-rows.csv",
"columnMapping": {
"Name": "companyName",
"Domain": "domain",
"Contact Email": "email"
},
"maxRows": 10,
"enrichEmails": false,
"verifyEmails": true,
"outputCsv": true
}
Input tips
- Test with
maxRows: 10first — on a large file, run 10 rows with your mapping to confirm the column names are matching before spending credits on the full file. - Map
domainorwebsite, not justemail— enrichment only runs on rows where a domain is known. If your CSV has neither, the actor cannot look up missing emails. - Use
deduplicateByDomain: falsefor contact-level lists — if your CSV intentionally has multiple contacts per company (e.g. 5 decision-makers at one account), disable deduplication so all rows are kept. - Batch in one run — processing 500 rows in one run is faster than 5 runs of 100, because enrichment sub-actors benefit from batch calls across all domains at once.
- European CSV files — if your file uses semicolons as delimiters (common in French, German, and Spanish Excel exports), set
csvDelimiterto;.
Output example
{
"companyName": "Meridian Software Group",
"website": "https://meridiansoftware.io",
"domain": "meridiansoftware.io",
"firstName": "Danielle",
"lastName": "Okafor",
"fullName": null,
"email": "[email protected]",
"phone": "+1 (415) 882-3301",
"title": "VP of Engineering",
"linkedinUrl": "https://linkedin.com/company/meridian-software-group",
"industry": "Enterprise Software",
"employeeCount": "51-200",
"city": "Austin",
"state": "TX",
"country": "United States",
"description": "B2B workflow automation platform for mid-market teams.",
"tags": "saas, enterprise, workflow",
"enrichedEmails": ["[email protected]", "[email protected]"],
"emailPattern": "{first}.{last}@meridiansoftware.io",
"emailPatternConfidence": 87,
"generatedEmails": ["[email protected]", "[email protected]"],
"emailVerified": true,
"emailStatus": "valid",
"emailConfidence": 94,
"sourceRowIndex": 3,
"enrichmentApplied": true,
"verificationApplied": true,
"processedAt": "2026-03-22T09:14:37.821Z"
}
In addition to individual lead records, the actor pushes a summary record (identifiable by "type": "summary") at the end of every run:
{
"type": "summary",
"totalRowsRead": 487,
"rowsAfterDedup": 431,
"rowsWithEmail": 358,
"rowsWithoutEmail": 73,
"enrichmentAttempted": 73,
"emailsFoundByEnrichment": 51,
"emailsVerified": 358,
"emailsValid": 312,
"emailsInvalid": 46,
"leadsPushed": 431,
"csvKey": "output-leads.csv",
"csvDownloadUrl": "https://api.apify.com/v2/key-value-stores/STORE_ID/records/output-leads.csv",
"enrichmentEnabled": true,
"verificationEnabled": true,
"deduplicatedByDomain": true,
"spendingLimitReached": false,
"processedAt": "2026-03-22T09:18:02.114Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
companyName | string | null | Company name from CSV |
website | string | null | Company website URL (derived from domain if not mapped) |
domain | string | null | Normalized domain, e.g. acmecorp.com (derived from website if not mapped) |
firstName | string | null | Contact first name |
lastName | string | null | Contact last name |
fullName | string | null | Contact full name (used for pattern-based email generation when first/last not available) |
email | string | null | Primary email. Source priority: CSV → contact scraper → pattern generator |
phone | string | null | Contact phone number |
title | string | null | Contact job title |
linkedinUrl | string | null | LinkedIn URL, normalized to https://linkedin.com/company/... |
industry | string | null | Company industry |
employeeCount | string | null | Company size string, e.g. 51-200 |
city | string | null | Company city |
state | string | null | Company state or region |
country | string | null | Company country |
description | string | null | Company or contact description |
tags | string | null | Comma-separated tags from the source CSV |
enrichedEmails | string[] | Additional emails found by the website contact scraper |
emailPattern | string | null | Detected email format pattern, e.g. {first}.{last}@domain.com |
emailPatternConfidence | number | null | Confidence score (0-100) for the detected pattern |
generatedEmails | string[] | Emails generated by applying the detected pattern to the contact name |
emailVerified | boolean | null | true if the primary email passed DNS MX + SMTP verification |
emailStatus | string | null | Verification result: valid, risky, invalid, unknown, or disposable |
emailConfidence | number | null | Verification confidence score (0-100) |
sourceRowIndex | number | 1-based row number in the source CSV for traceability |
enrichmentApplied | boolean | Whether this lead went through the enrichment sub-actors |
verificationApplied | boolean | Whether this lead's email was submitted to bulk-email-verifier |
processedAt | string | ISO 8601 timestamp when this row was processed |
How much does it cost to process a CSV lead list?
CSV Lead Processor uses pay-per-event pricing — you pay $0.05 per lead successfully parsed and pushed to the dataset. Platform compute costs are included.
| Scenario | Leads | Cost per lead | Total cost |
|---|---|---|---|
| Quick test | 10 | $0.05 | $0.50 |
| Small campaign | 100 | $0.05 | $5.00 |
| Medium list | 500 | $0.05 | $25.00 |
| Large export | 2,000 | $0.05 | $100.00 |
| Agency monthly | 5,000 | $0.05 | $250.00 |
Note that enabling enrichEmails or verifyEmails triggers sub-actor runs (Website Contact Scraper at $0.15/site, Email Pattern Finder at $0.10/domain, Bulk Email Verifier at $0.005/email) which are charged separately against your Apify account. A fully enriched and verified run on 500 leads where 150 need email enrichment and all 500 are verified costs approximately $25 (lead processing) + $22.50 (contact scraper) + $10 (pattern finder fallback on residual domains) + $2.50 (verification) = around $60 total — compare that to Hunter.io at $99/month for 1,000 searches, or Clay at $149-499/month.
You can set a maximum spending limit per run in the actor's settings to control total cost. The actor stops cleanly when your budget cap is reached, so you are never charged more than you authorize.
CSV lead processing using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/csv-lead-processor").call(run_input={
"csvUrl": "https://storage.googleapis.com/my-bucket/leads.csv",
"columnMapping": {
"Company": "companyName",
"Website": "website",
"First Name": "firstName",
"Last Name": "lastName",
"Email": "email",
"Title": "title",
"Industry": "industry",
"Country": "country"
},
"enrichEmails": True,
"verifyEmails": True,
"deduplicateByDomain": True,
"outputCsv": True
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "summary":
print(f"Processed: {item['leadsPushed']} leads | Valid emails: {item['emailsValid']}")
print(f"CSV download: {item.get('csvDownloadUrl')}")
else:
status = item.get("emailStatus", "unverified")
print(f"{item.get('companyName')} | {item.get('email')} | {status}")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/csv-lead-processor").call({
csvUrl: "https://storage.googleapis.com/my-bucket/leads.csv",
columnMapping: {
"Company": "companyName",
"Website": "website",
"First Name": "firstName",
"Last Name": "lastName",
"Email": "email",
"Title": "title",
"Industry": "industry",
"Country": "country"
},
enrichEmails: true,
verifyEmails: true,
deduplicateByDomain: true,
outputCsv: true
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.type === "summary") {
console.log(`Leads processed: ${item.leadsPushed} | Emails valid: ${item.emailsValid}`);
console.log(`CSV: ${item.csvDownloadUrl}`);
} else {
console.log(`${item.companyName} | ${item.email} | ${item.emailStatus}`);
}
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~csv-lead-processor/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"csvUrl": "https://storage.googleapis.com/my-bucket/leads.csv",
"columnMapping": {
"Company": "companyName",
"Website": "website",
"First Name": "firstName",
"Last Name": "lastName",
"Email": "email",
"Title": "title",
"Industry": "industry",
"Country": "country"
},
"enrichEmails": true,
"verifyEmails": true,
"deduplicateByDomain": true,
"outputCsv": true
}'
# Fetch results once the run completes (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
# Download the enriched CSV directly (replace STORE_ID from the summary record)
curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/output-leads.csv?token=YOUR_API_TOKEN" \
-o enriched-leads.csv
How CSV Lead Processor works
Phase 1: CSV loading and streaming parse
The actor accepts a CSV via URL (fetched with up to 3 retries using exponential backoff) or as a base64-encoded string. The raw bytes are scanned for a UTF-8 BOM (EF BB BF) and stripped before parsing — this is critical because Excel's "Save as CSV" produces BOM-prefixed files where the first header would otherwise appear as \uFEFFCompany Name rather than Company Name, breaking column mapping.
The CSV is then streamed through csv-parse with columns: true (header row auto-detection), relax_column_count: true (ragged row tolerance), and your chosen delimiter. Rows are yielded one at a time via an async generator, so even large files never load as a full in-memory object. For each row, mapRow() performs a case-insensitive header lookup, applies field normalization (email lowercasing and format validation, LinkedIn URL expansion, domain extraction from website URLs), and assembles the canonical lead record.
Domain deduplication uses a Set<string> keyed on the normalized lowercase domain. The first row per domain is kept; subsequent rows are skipped with a debug log entry.
Phase 2: Two-stage email enrichment (optional)
All leads that have a domain but no email are collected into a single batch. The batch is sent to Website Contact Scraper as a list of URLs with maxPagesPerDomain: 3. The scraper visits each company's homepage, contact page, and about page, returning any email addresses found. These are applied to the corresponding lead records.
Leads that still have no email after the contact scraper step go to a second batch call to Email Pattern Finder. The pattern finder analyzes domain DNS records and public web presence to detect the email naming convention (e.g. {first}.{last}@domain.com, confidence: 87). When a contact name is available (from fullName or combined firstName + lastName), it generates candidate emails by applying the pattern. The top candidate is set as the primary email field, and all generated candidates are stored in generatedEmails.
Both enrichment sub-actors are called once per run regardless of how many leads need enrichment — batching is key to keeping run time and cost predictable.
Phase 3: Email verification (optional)
All unique email addresses across all leads (both from the original CSV and from enrichment) are deduplicated into a single list and sent to Bulk Email Verifier with verificationLevel: standard. The verifier performs DNS MX record lookup (confirming the mail server exists) and SMTP probing (confirming the mailbox exists without sending an actual message). Results are indexed by email address and applied back to each lead: emailVerified (boolean), emailStatus (valid / risky / invalid / unknown / disposable), and emailConfidence (0-100).
Again, a single batch call is made — not one call per lead.
Phase 4: Output and CSV generation
Enriched leads are pushed to the Apify dataset one by one. In pay-per-event mode, the actor charges one lead-processed event ($0.05) after each successful push and checks whether the spending limit has been reached. If the limit is hit, the actor stops gracefully and records spendingLimitReached: true in the summary.
If outputCsv is enabled, the actor serializes all pushed leads using csv-stringify with the canonical column set (or your outputColumns selection) plus the enrichment columns always appended to the right. The output buffer is prepended with a UTF-8 BOM for Excel compatibility and written to the actor's Key-Value Store as output-leads.csv. The public download URL is recorded in the summary dataset record.
Tips for best results
-
Run a 10-row test before the full list. Set
maxRows: 10and check that your column mapping is picking up the right fields. Look at the output to confirmdomainis being extracted correctly — enrichment depends on it. -
Map
domainorwebsiteeven if you already have emails. If you later want to re-run withenrichEmails: trueto fill gaps, the actor needs the domain. Mapping it costs nothing in the initial run. -
Use
deduplicateByDomain: falsefor multi-contact lists. If you have 5 decision-makers listed per company and want to enrich and verify each contact independently, disable deduplication. The trade-off is that enrichment sub-actors will be called once per domain, not once per contact — so you may get the same scraped emails applied to multiple contacts at the same company. -
Filter output by
emailStatus: "valid"before importing to your CRM. The verification step identifies risky and invalid addresses. Importing onlyvalidstatus leads keeps your sender reputation clean and avoids bounces that affect deliverability. -
Set a
maxRowsbudget cap on unknown-size files. If you receive a CSV from a third party and are unsure of its row count, setmaxRows: 500to cap your first run cost at $25 while you validate quality. -
Combine with HubSpot Lead Pusher — run this actor first to normalize and enrich, then feed the output dataset into HubSpot Lead Pusher to create or update CRM contacts automatically.
-
Schedule weekly on refreshed exports. If your sales team exports from Apollo or LinkedIn weekly, schedule this actor on the same cadence. New rows in each export are enriched and verified automatically, keeping your working list current without manual intervention.
-
For Google Sheets input, use a CSV export link. In Google Sheets, go to File > Share > Publish to web, publish the sheet as CSV, and paste that URL into
csvUrl. The actor fetches the current data on every run.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Contact Scraper | Called automatically during enrichment; run standalone first to preview what emails are available on your target company sites |
| Email Pattern Finder | Called automatically as enrichment fallback; run standalone to detect patterns for a domain list before processing a full CSV |
| Bulk Email Verifier | Called automatically during verification; run standalone on any email list to get deliverability scores before sending |
| HubSpot Lead Pusher | Push this actor's output dataset directly into HubSpot as new contacts or company records |
| B2B Lead Gen Suite | Use when you don't have a list at all — this suite generates leads from scratch from website URLs; CSV Lead Processor handles the "bring your own list" path |
| Google Maps Email Extractor | Export local business results from Google Maps as a CSV, then run it through this actor to standardize and verify the contacts |
| Waterfall Contact Enrichment | For higher-value accounts where you want a 10-step enrichment cascade beyond what the two built-in sub-actors provide |
Limitations
- No JavaScript-rendered website support during enrichment. The Website Contact Scraper sub-actor uses HTTP-based parsing. Company sites that load contact details exclusively via JavaScript (e.g. single-page apps with lazy-loaded content) may return no emails. For those sites, use Website Contact Scraper Pro as a separate step before running this actor.
- Enrichment requires a domain. Rows with a company name but no website or domain cannot be enriched. Map the
websiteordomaincolumn to enable enrichment for those rows. - Pattern finder accuracy varies by domain. The pattern confidence score indicates reliability. Patterns below 70 confidence should be treated as candidates, not confirmed addresses — always pair pattern-based emails with verification.
- Email verification cannot guarantee delivery. SMTP probing confirms a mailbox exists but does not guarantee the message will not be filtered by spam rules, greylisting, or catch-all configurations.
emailStatus: "risky"means the address exists but has characteristics associated with higher bounce rates. - Catch-all domains report as
unknownorrisky. Some mail servers accept any recipient address to prevent address enumeration. Addresses at these domains cannot be confirmed as real without sending an actual email. - Sub-actor call limits at scale. Each sub-actor call (contact scraper, pattern finder, verifier) is limited to 1,000 results per batch. Lists exceeding 1,000 domains needing enrichment in a single run may see incomplete enrichment for the tail. Split very large files into batches of 800 domains or fewer when using enrichment.
- No real-time progress for enrichment steps. While enrichment sub-actors run, the actor status shows "Enriching emails for N domains..." but does not stream per-domain progress. This is a platform-level constraint on sub-actor calls.
- Source CSV must be UTF-8 or UTF-8 BOM encoded. Files in Latin-1, Windows-1252, or other encodings may produce garbled text in non-ASCII characters (accented names, etc.). Convert to UTF-8 before uploading.
- The summary record is the last item in the dataset. When iterating the dataset programmatically, filter for
item.type === "summary"to find the statistics record rather than assuming it is the first or last item positionally.
Integrations
- Zapier — trigger this actor when a new CSV is added to a Google Drive folder, then push enriched leads to HubSpot, Salesforce, or a Slack notification
- Make — build a scenario that polls the actor's dataset on completion and routes verified leads into your outreach sequence tool
- Google Sheets — publish your lead sheet as a CSV URL and pass it to this actor on a schedule; push enriched results back to a separate output sheet
- Apify API — call the actor programmatically from your own CRM workflow, pass CSVs as base64, and retrieve structured JSON output without needing the UI
- Webhooks — configure a webhook to fire on run completion and POST the summary record (including the CSV download URL) to your internal systems
- LangChain / LlamaIndex — use enriched lead records as structured context for AI-powered sales research agents or account scoring workflows
Troubleshooting
Empty results despite a valid CSV file. The most common cause is a column mapping mismatch. Open the actor log and look for lines starting with Row 1: — if the canonical fields are all null, your CSV headers do not match your mapping keys. Check for trailing spaces in your CSV headers, BOM characters (the actor strips the file-level BOM but not per-cell BOM), or tab-delimited files passed with csvDelimiter still set to comma.
Enrichment ran but found no emails. The website contact scraper visits up to 3 pages per domain. If a company's contact details are behind a login, on a JavaScript-rendered SPA, or only in a PDF or image, the scraper will not find them. Check enrichmentApplied: true vs emailsFoundByEnrichment in the summary record to see the success rate. For SPA-heavy company sites, run Website Contact Scraper Pro separately first.
Run is taking much longer than expected. Enrichment and verification involve external sub-actor runs and live network calls. A 500-domain enrichment run can take 5-15 minutes depending on website response times and mail server latency. For very large lists, consider splitting into batches of 300-400 rows and running in parallel. Set maxRows to limit the first run.
spendingLimitReached: true in the summary. You have set a per-run spending cap that was hit before all rows were processed. Increase the cap in the actor's "Max total charge per run" setting and re-run on the remaining rows. You can identify which rows were not processed using sourceRowIndex — the last pushed sourceRowIndex value shows where the run stopped.
Emails showing as unknown across many leads. These companies likely use catch-all mail servers. The verifier cannot confirm individual mailbox existence at these domains without sending an actual email. Treat unknown addresses as lower-confidence leads and consider lower send volumes to those domains.
Responsible use
- This actor processes only data you provide — it does not scrape personal data from external sources independently.
- When enrichment is enabled, the actor visits company websites to extract publicly listed contact information. Respect each website's terms of service and
robots.txtdirectives. - Comply with GDPR, CAN-SPAM, CASL, and other applicable data protection and anti-spam laws when using processed contact data for outreach.
- Do not upload CSVs containing sensitive personal data beyond what is necessary for your legitimate business purpose.
- Do not use output data for spam, harassment, or unauthorized marketing.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How do I process a CSV lead list that has inconsistent column names?
The column mapping uses case-insensitive matching, so "Company Name", "company name", and "COMPANY NAME" all resolve to the same header. If your file has genuinely different column names (e.g. "Org" instead of "Company"), just set the key in columnMapping to match your actual header. You do not need to rename columns in your CSV before uploading.
Can I process a CSV file from Google Drive or Dropbox?
Yes. Google Drive shareable links and Dropbox direct download links work when the file is publicly accessible. In Google Drive, use File > Share > Publish to web > CSV to get a direct download URL. In Dropbox, change ?dl=0 to ?dl=1 at the end of the share link to force a download. Google Sheets can also be published as a CSV via File > Share > Publish to web.
How many leads can I process in one CSV lead processing run?
The actor imposes no hard limit — maxRows defaults to 0 (unlimited). In practice, memory allocation (512 MB) comfortably handles files of 50,000+ rows for parse-only runs. Enrichment adds sub-actor call overhead; for lists over 1,000 rows requiring enrichment, consider splitting into batches of 800 to stay within sub-actor result limits.
Does CSV lead processing work with tab-separated (TSV) files?
Yes. Set csvDelimiter to \t in the input. TSV files from data warehouses, Airtable exports, and some CRM exports are fully supported.
How accurate is the email pattern detection?
The Email Pattern Finder returns a confidence score (0-100) with each pattern. Patterns with confidence above 85 are generally reliable for name-based email generation. Below 70, treat the generated address as a candidate to be verified rather than a confirmed address. Always pair pattern-based enrichment with verifyEmails: true for outreach use.
Is it legal to scrape company websites for contact information during enrichment? The enrichment step visits publicly accessible company websites — the same pages you could open in a browser — to extract email addresses that companies have chosen to make public. This is generally considered legal under most jurisdictions. However, you are responsible for complying with applicable laws in your market and the terms of service of the websites visited. See Apify's legal guide for more detail.
How is CSV Lead Processor different from Hunter.io? Hunter.io charges $49-149/month for domain search and email finding with monthly credit limits. This actor charges $0.05 per lead with no subscription — a 500-lead list costs $25, plus sub-actor costs for enrichment. It also accepts any CSV format, handles deduplication, normalizes field values, and writes back a clean output CSV in one run. Hunter.io requires a separate export step after finding emails.
Can I use this actor to verify emails I already have, without importing a full CSV? For a pure email verification run, use Bulk Email Verifier directly — it accepts a list of email addresses with no CSV involved and costs $0.005 per email. This actor's verification feature is designed for use as part of a full lead processing pipeline.
What happens if my CSV has rows with no domain or website?
Rows without a domain are parsed and included in the output with all other mapped fields intact. Enrichment simply skips those rows (it cannot look up emails without a domain). If you map the companyName field, you can use those records for manual research or pass them to Company Deep Research to find the website first.
Can I schedule this actor to run automatically on a recurring basis? Yes. In the Apify console, open the actor and go to the Schedules tab. Set a daily, weekly, or custom cron interval. Combine this with a Google Sheets CSV publish URL as the input source, and your enrichment pipeline runs automatically whenever you update the source sheet.
How do I push the output to my CRM after processing? Use HubSpot Lead Pusher to import this actor's output dataset into HubSpot contacts or companies. For Salesforce, Pipedrive, or other CRMs, use the Zapier or Make integrations to map dataset fields to your CRM's API on run completion. Alternatively, download the output CSV from the Key-Value Store URL in the summary record and import it manually.
What does emailStatus: "risky" mean?
A risky status means the email address exists and the mailbox responded, but it has characteristics associated with higher bounce rates — typically a role address (info@, sales@, contact@), a recently created domain, or a mail server with unusual configuration. Risky addresses can be sent to but should be used in smaller volumes until you see how they perform.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom column mapping help, bulk processing requirements, or CRM integration questions, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Weather Forecast Search
Get weather forecasts for any location worldwide using the free Open-Meteo API. Returns current conditions, daily and hourly forecasts with temperature, precipitation, wind, UV index, and more. No API key needed.
EUIPO EU Trademark Search
Search EU trademarks via official EUIPO database. Find registered and pending trademarks by name, Nice class, applicant, or status. Returns full trademark details and filing history.
Nominatim Address Geocoder
Geocode addresses to GPS coordinates and reverse geocode coordinates to addresses using OpenStreetMap Nominatim. Batch geocoding with rate limiting. Free, no API key needed.
Ready to try Csv Lead Processor?
Start for free on Apify. No credit card required.
Open on Apify Store