Agency Lead Gen Pipeline
Agency lead generation, fully automated from first search to sent email. This actor chains 6 specialized sub-actors into a single pipeline: it discovers agencies from four directories, analyzes their websites, scrapes and pattern-generates email addresses, verifies every address for deliverability, then sends personalized cold email sequences — all from one input form. Built for SaaS founders, B2B service providers, and sales teams who sell to marketing and digital agencies.
Maintenance Pulse
90/100Documentation
Agency lead generation, fully automated from first search to sent email. This actor chains 6 specialized sub-actors into a single pipeline: it discovers agencies from four directories, analyzes their websites, scrapes and pattern-generates email addresses, verifies every address for deliverability, then sends personalized cold email sequences — all from one input form. Built for SaaS founders, B2B service providers, and sales teams who sell to marketing and digital agencies.
The pipeline handles the entire workflow that typically takes a sales rep 8-12 hours per 50 contacts: directory research, website qualification, contact finding, email enrichment, and first-touch outreach. Set your service category, location, and email credentials, click Start, and the pipeline hands you a fully enriched lead dataset with outreach status on every record.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| 📋 Agency name | Directory listing | Pinnacle Digital Group |
| 🌐 Website | Directory listing | https://pinnacledigital.co |
| 🏷️ Services listed | Directory profile | ["seo", "ppc", "content-marketing"] |
| 📍 Location | Directory profile | Austin, Texas |
| 👥 Employee count | Directory profile | 10-49 |
| ⭐ Rating / review count | Directory profile | 4.8 / 37 reviews |
| 🏢 Directory sources | Merge step | ["clutch", "designrush"] |
| 🔍 Services offered (website) | Website analysis | ["paid-search", "google-ads", "facebook-ads"] |
| 🏭 Industries served | Website analysis | ["ecommerce", "saas", "healthcare"] |
| 🧑💼 Named contacts | Contact scraping | [{"name": "Mark Haines", "title": "Founder"}] |
| 💬 Summary snippet | Website analysis | "Full-service performance agency specializing in DTC brands..." |
| 📧 Scraped emails | Contact scraping | ["[email protected]"] |
| 🔑 Email pattern | Pattern detection | {first}.{last}@pinnacledigital.co |
| 📬 Pattern emails | Pattern generation | ["[email protected]"] |
| ✅ Best verified email | MX + SMTP verification | [email protected] |
| 📤 Outreach status | Email send | sent |
| 🔁 Follow-up scheduled | Outreach sequencer | true |
| 📊 Run summary | Pipeline merge | agenciesDiscovered, emailsSent, stepsCompleted |
Why use Agency Lead Gen Pipeline?
Building an agency prospect list by hand means hours of tab-switching between Clutch, DesignRush, and Sortlist, manually copying contacts into a spreadsheet, running each email through a verifier one by one, and writing and sending individual outreach emails. For 50 agencies that process easily takes a full working day, and the data is stale as soon as you finish.
This pipeline automates the entire workflow in a single run. Provide a service category and location, configure your sending credentials, and the actor handles discovery, enrichment, verification, and outreach while you focus on replies.
Running on Apify also gives you:
- Scheduling — run weekly on a cron schedule to continuously discover newly listed agencies and add fresh leads to your pipeline
- API access — trigger pipeline runs from your CRM, Make, or Zapier workflows via a single HTTP call
- Proxy rotation — all four agency directories block datacenter IPs; Apify's residential proxy network keeps you unblocked at scale
- Monitoring — receive Slack or email alerts when a run fails or returns zero results
- Integrations — push enriched leads directly to HubSpot, Google Sheets, or any webhook endpoint after each run
Features
- 6-sub-actor orchestration — coordinates agency-directory-scraper, website-content-analyzer, website-contact-scraper, email-pattern-finder, bulk-email-verifier, and outreach-sequencer in the correct order, passing outputs between steps automatically
- 4-directory coverage — scrapes Clutch, Sortlist, AgencySpotter, and DesignRush in parallel, then deduplicates agencies by normalized domain, keeping the richest record per agency
- Smart domain deduplication — normalizes all domains (strips
www., protocol, trailing slash, lowercases) and merges agency records that appear in multiple directories into one enriched lead, accumulating all source names indirectorySources - Email fallback chain — for each agency, attempts direct contact scraping (up to 5 pages per domain), then falls back to email pattern detection for domains with no scraped emails, maximizing coverage across both methods
- Best-email selection algorithm — the
pickBestEmailfunction prefers non-catch-all verified addresses over catch-all valid addresses, reducing bounce risk before any email leaves your server - Personalized template variables — outreach templates support
{{firstName}},{{companyName}},{{topService}},{{summarySnippet}}, and{{unsubscribeLink}}; the summary snippet is drawn directly from the agency's live website content - 3-provider email delivery — supports SMTP (Brevo, Gmail, AWS SES, any SMTP relay), SendGrid HTTP API v3, and Mailgun Messages API — no provider lock-in
- Automated follow-up sequencing — optional day-3 and day-7 follow-up templates; the pipeline schedules these via the outreach-sequencer and tracks sequence IDs in the output dataset
- Dry-run mode — renders and logs all email templates without dispatching a single message, so you can verify personalization before going live
- Skip-outreach mode — runs all discovery and enrichment steps and stops before sending; build your verified lead list first, send later
- CAN-SPAM compliance enforcement —
unsubscribeUrlis a required field; the actor renders and injects a working unsubscribe link into every outgoing email - Per-run email rate limiting — a hard cap prevents your provider's daily limits from being exceeded; configurable from 1 to 10,000 per run
- Non-fatal step handling — steps 2-6 are non-fatal; if website analysis or the verifier fails for a subset of domains, the pipeline continues and outputs what it has rather than crashing the entire run
- PPE pricing model — charged once per completed pipeline run at $0.25; orchestrator and sub-actor compute costs are both covered by the single event charge
Use cases for agency lead generation pipeline
Selling SaaS tools to marketing agencies
Software founders whose product serves marketing agencies — white-label reporting tools, SEO platforms, client management software, ad automation — need a repeatable way to find and contact agency decision-makers without a large sales team. This pipeline discovers agencies by service category and location, identifies the founder or head of growth from website content, verifies their email, and sends a personalized intro sequence referencing the agency's own service language from their website.
B2B service provider prospecting
Consultants, freelancers, and boutique service firms that subcontract to agencies — copywriters, designers, media buyers, developers — can use this pipeline to build a targeted outreach list of agencies that fit their ideal client profile. The sizeSignal field from website analysis (solo, small, mid, large) and the industry tags let you filter results in the output dataset before committing to a sending campaign.
Recruiting and partnership development
Agency owners looking to build referral partnerships with complementary agencies in specific cities or verticals can run this pipeline to identify candidates, review their client lists and tone keywords from the content analysis step, and send tailored partnership inquiry emails to the most relevant contacts.
Outreach agency white-labeling
Agencies that run outbound prospecting campaigns on behalf of clients can use this pipeline as a white-label engine. Configure the sender credentials and templates for each client, set skipOutreach: true for the data delivery step, export to CSV or push to HubSpot, and hand the enriched lead list to the client's sales team.
Competitive intelligence and market mapping
Analysts and strategists conducting market research on the agency landscape for a specific service category and geography get a structured dataset covering agency count, size distribution, rating trends, and service offering depth — without any manual directory browsing. Filter and group the output by sizeSignal, location, and industriesServed to build a market map.
Sales team cold outreach at scale
SDR and BDR teams at B2B companies targeting agencies as a vertical can run this pipeline weekly on a schedule. Each run discovers newly listed agencies, the merge step deduplicates by domain, and fresh leads enter the outreach sequence automatically. A team of two reps can maintain a perpetually refreshed pipeline of 200+ agencies per week without manual research.
How to run the agency lead generation pipeline
-
Enter your target criteria — type your service category (e.g.,
seo,ppc,social-media,content-marketing) and a location such asNew YorkorUnited Kingdom. Leave location blank for global results. SetmaxAgenciesto the number of agencies you want in the final dataset (1-500, default 50). -
Configure your email credentials — choose your email provider (
smtp,sendgrid, ormailgun). For SMTP, enter your host, port (587 for STARTTLS is recommended), username, and password. For SendGrid or Mailgun, paste your API key. Fill in your sender name, sender email, and a working unsubscribe URL. SetrateLimitPerRunto your provider's daily sending limit. -
Write your templates — fill in the subject line and email body. Use
{{firstName}},{{companyName}},{{topService}}, and{{summarySnippet}}to personalize. Optionally add day-3 and day-7 follow-up bodies. To test without sending, check "Dry run emails". To run enrichment only without any outreach, check "Skip email sending". -
Run and download results — click Start. The pipeline runs all 6 steps sequentially, which takes 20-60 minutes depending on agency count. When the run completes, download your enriched leads from the Dataset tab as JSON, CSV, or Excel. Each record shows discovery data, verified email, and outreach status.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
serviceCategory | string | Yes | digital-marketing | Agency service category to search across directories. Examples: seo, ppc, social-media, content-marketing |
location | string | No | United States | City, state, or country filter. Leave blank for global results |
maxAgencies | integer | No | 50 | Total unique agencies to run through the pipeline. Range: 1-500 |
directorySources | array | No | ["clutch","sortlist","agencyspotter","designrush"] | Which directories to scrape. Valid values: clutch, sortlist, agencyspotter, designrush |
senderName | string | Yes | — | Display name in the From field. Example: Jane at Acme Corp |
senderEmail | string | Yes | — | Authenticated sending address (SPF/DKIM/DMARC required on your provider) |
emailProvider | string | Yes | — | smtp, sendgrid, or mailgun |
smtpHost | string | Conditional | — | Required when emailProvider is smtp. Example: smtp-relay.brevo.com |
smtpPort | integer | Conditional | 587 | SMTP port. Use 587 (STARTTLS) or 465 (SSL) |
smtpUser | string | Conditional | — | SMTP username/login |
smtpPass | string | Conditional | — | SMTP password or provider API key (stored encrypted) |
apiKey | string | Conditional | — | SendGrid or Mailgun private API key (stored encrypted) |
mailgunDomain | string | Conditional | — | Mailgun sending domain, e.g. mail.yourdomain.com |
unsubscribeUrl | string | Yes | — | Unsubscribe endpoint. Supports {{sequenceId}} and {{email}} placeholders |
rateLimitPerRun | integer | Yes | 100 | Hard cap on emails sent per run. Range: 1-10,000 |
emailTemplate | string | Yes | (default template) | HTML or plain-text email body. Variables: {{firstName}}, {{companyName}}, {{topService}}, {{summarySnippet}}, {{unsubscribeLink}} |
emailSubject | string | Yes | Quick question for {{companyName}} | Subject line. Supports {{companyName}} and {{firstName}} |
followUpTemplate3 | string | No | "" | Day-3 follow-up body. Leave blank to disable |
followUpTemplate7 | string | No | "" | Day-7 follow-up body. Leave blank to disable |
skipOutreach | boolean | No | true | Run discovery and enrichment only; skip all email sending |
dryRunEmails | boolean | No | false | Render and log emails without dispatching them |
proxyConfiguration | object | No | {"useApifyProxy":true} | Proxy settings. Residential proxies strongly recommended |
Input examples
Discovery and enrichment only — no emails sent:
{
"serviceCategory": "seo",
"location": "United Kingdom",
"maxAgencies": 30,
"directorySources": ["clutch", "designrush"],
"senderName": "James at Rankify",
"senderEmail": "[email protected]",
"emailProvider": "smtp",
"smtpHost": "smtp-relay.brevo.com",
"smtpPort": 587,
"smtpUser": "[email protected]",
"smtpPass": "your-brevo-smtp-key",
"unsubscribeUrl": "https://rankify.io/unsubscribe?sid={{sequenceId}}&e={{email}}",
"rateLimitPerRun": 100,
"emailTemplate": "Hi {{firstName}},\n\nI came across {{companyName}} — {{summarySnippet}}\n\nWould you be open to a quick call?\n\nBest, James\n\n---\n{{unsubscribeLink}}",
"emailSubject": "Quick question for {{companyName}}",
"skipOutreach": true,
"dryRunEmails": false,
"proxyConfiguration": { "useApifyProxy": true }
}
Full pipeline with day-3 and day-7 follow-ups via SendGrid:
{
"serviceCategory": "ppc",
"location": "United States",
"maxAgencies": 100,
"directorySources": ["clutch", "sortlist", "agencyspotter", "designrush"],
"senderName": "Alex at AdStack",
"senderEmail": "[email protected]",
"emailProvider": "sendgrid",
"apiKey": "SG.your-sendgrid-api-key",
"unsubscribeUrl": "https://adstack.io/unsubscribe?sid={{sequenceId}}&e={{email}}",
"rateLimitPerRun": 200,
"emailTemplate": "Hi {{firstName}},\n\nNoticed {{companyName}} runs paid search for {{topService}} clients — we built a bid optimization layer that typically cuts CPA by 15-20% without changing spend.\n\nWorth a 15-minute look?\n\nAlex\n\n---\n{{unsubscribeLink}}",
"emailSubject": "Cutting CPA for {{companyName}} clients",
"followUpTemplate3": "Hi {{firstName}},\n\nJust circling back — happy to send a one-pager if it helps.\n\nAlex\n\n---\n{{unsubscribeLink}}",
"followUpTemplate7": "Hi {{firstName}},\n\nLast one from me — if the timing isn't right, no worries. I'll check back in Q2.\n\nAlex\n\n---\n{{unsubscribeLink}}",
"skipOutreach": false,
"dryRunEmails": false,
"proxyConfiguration": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}
Dry run to verify personalization before going live:
{
"serviceCategory": "social-media",
"location": "New York",
"maxAgencies": 10,
"directorySources": ["clutch"],
"senderName": "Maya at ContentLab",
"senderEmail": "[email protected]",
"emailProvider": "smtp",
"smtpHost": "smtp.gmail.com",
"smtpPort": 587,
"smtpUser": "[email protected]",
"smtpPass": "your-gmail-app-password",
"unsubscribeUrl": "https://contentlab.co/unsubscribe?sid={{sequenceId}}&e={{email}}",
"rateLimitPerRun": 50,
"emailTemplate": "Hi {{firstName}},\n\nLove what {{companyName}} is doing in {{topService}}.\n\n{{summarySnippet}}\n\nMaya\n\n---\n{{unsubscribeLink}}",
"emailSubject": "Partnership idea for {{companyName}}",
"skipOutreach": false,
"dryRunEmails": true,
"proxyConfiguration": { "useApifyProxy": true }
}
Input tips
- Start with
skipOutreach: true— run discovery and enrichment first, review the lead dataset to confirm email quality and targeting, then re-run with outreach enabled. This prevents sending to a bad list. - Use residential proxies — all four agency directories actively block datacenter IPs. Set
useApifyProxy: truewithapifyProxyGroups: ["RESIDENTIAL"]for reliable directory access. - Match
rateLimitPerRunto your provider's daily limit — Brevo free tier allows 300/day, SendGrid free 100/day, Gmail 2,000/day, AWS SES sandbox 200/day. Exceeding this causesdeferredstatus on affected emails. - Use
dryRunEmails: trueon small batches first — check rendered templates in the dataset before sending at scale. Personalization errors are visible in the output before a single email leaves your server. - Keep
maxAgenciesunder 100 for first runs — larger batches take 45-90 minutes. Start at 25-50 to validate your template and targeting before scaling.
Output example
{
"type": "lead",
"domain": "pinnacledigital.co",
"agencyName": "Pinnacle Digital Group",
"website": "https://pinnacledigital.co",
"services": ["seo", "ppc", "content-marketing"],
"location": "Austin, Texas",
"employeeCount": "10-49",
"reviewCount": 37,
"rating": 4.8,
"directorySources": ["clutch", "designrush"],
"servicesOffered": ["paid-search", "google-ads", "facebook-ads", "seo"],
"industriesServed": ["ecommerce", "saas", "healthcare"],
"clientNames": ["BlueWave Inc", "Acme Corp", "Meridian Health"],
"sizeSignal": "small",
"toneKeywords": ["data-driven", "transparent", "performance-focused"],
"summarySnippet": "Full-service performance agency specializing in DTC and SaaS brands, known for transparent reporting and measurable ROI.",
"scrapedEmails": ["[email protected]", "[email protected]"],
"contacts": [
{ "name": "Mark Haines", "title": "Founder & CEO", "email": "[email protected]" }
],
"patternEmails": [],
"emailPattern": null,
"patternConfidence": null,
"bestEmail": "[email protected]",
"verifiedEmails": ["[email protected]"],
"allCandidateEmails": ["[email protected]", "[email protected]"],
"outreachStatus": "sent",
"sequenceId": "seq_a8f3d291",
"followUpScheduled": true,
"enrichedAt": "2026-03-22T14:05:31.842Z"
}
The pipeline also appends one summary record per run:
{
"type": "summary",
"agenciesDiscovered": 47,
"agenciesWithWebsites": 43,
"agenciesWithContacts": 38,
"agenciesWithVerifiedEmail": 31,
"emailsSent": 31,
"stepsCompleted": [
"agency-directory-scraper",
"website-content-analyzer",
"website-contact-scraper",
"email-pattern-finder",
"bulk-email-verifier",
"outreach-sequencer"
],
"skipOutreach": false,
"dryRunEmails": false,
"processedAt": "2026-03-22T14:48:12.003Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
type | string | lead for per-agency records, summary for the final run summary |
domain | string | Normalized domain (no www., lowercase) — primary deduplication key |
agencyName | string | null | Agency name from directory listing |
website | string | null | Full homepage URL with protocol |
services | string[] | Service categories from directory profile |
location | string | null | City/country from directory |
employeeCount | string | null | Employee count range from directory |
reviewCount | number | null | Review count on directory listing |
rating | number | null | Average directory rating |
directorySources | string[] | Which directories this agency appeared in |
servicesOffered | string[] | Services detected from website content analysis |
industriesServed | string[] | Industries detected from website content |
clientNames | string[] | Client names mentioned on the agency website |
sizeSignal | string | Estimated size: solo, small, mid, large, or unknown |
toneKeywords | string[] | Top descriptive keywords from the agency homepage |
summarySnippet | string | null | Short website excerpt used for email personalization |
scrapedEmails | string[] | Emails found directly on the agency website |
contacts | object[] | Named contacts: name, title, email per entry |
patternEmails | string[] | Emails generated from detected domain naming pattern |
emailPattern | string | null | Detected format, e.g. {first}.{last}@domain.com |
patternConfidence | number | null | Pattern confidence score (0-1) |
bestEmail | string | null | Highest-confidence verified email for this agency |
verifiedEmails | string[] | All emails that passed MX/SMTP verification |
allCandidateEmails | string[] | All emails considered (scraped + pattern-generated), deduplicated |
outreachStatus | string | null | sent, failed, deferred, dry-run, or null if skipped |
sequenceId | string | null | Outreach sequence ID for follow-up tracking |
followUpScheduled | boolean | null | Whether a follow-up email is scheduled for this lead |
enrichedAt | string | ISO 8601 timestamp of record creation |
How much does it cost to run the agency lead generation pipeline?
Agency Lead Gen Pipeline uses pay-per-event pricing — you pay $0.25 per completed pipeline run. That flat rate covers the orchestrator and all sub-actor compute costs regardless of how many agencies are processed within the run.
| Scenario | Pipeline runs | Cost per run | Total cost |
|---|---|---|---|
| Quick test (10 agencies) | 1 | $0.25 | $0.25 |
| Weekly prospecting (50 agencies) | 4/month | $0.25 | $1.00/month |
| Daily pipeline (50 agencies/day) | 30/month | $0.25 | $7.50/month |
| Multi-city campaign (4 cities, weekly) | 16/month | $0.25 | $4.00/month |
| Enterprise volume (10 runs/day) | 300/month | $0.25 | $75.00/month |
You can set a maximum spending limit per run in the Apify console to control costs. The actor stops cleanly when your budget is reached.
Compare this to Apollo.io at $49-$99/month or ZoomInfo at $15,000+/year — with this pipeline, most teams spend $1-$10/month with no subscription commitment, and every lead is enriched with live website data and a verified email, not a stale database record.
Run the agency lead generation pipeline using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/agency-lead-gen-pipeline").call(run_input={
"serviceCategory": "seo",
"location": "United States",
"maxAgencies": 50,
"directorySources": ["clutch", "sortlist", "agencyspotter", "designrush"],
"senderName": "Jane at Rankify",
"senderEmail": "[email protected]",
"emailProvider": "smtp",
"smtpHost": "smtp-relay.brevo.com",
"smtpPort": 587,
"smtpUser": "[email protected]",
"smtpPass": "YOUR_SMTP_PASSWORD",
"unsubscribeUrl": "https://rankify.io/unsubscribe?sid={{sequenceId}}&e={{email}}",
"rateLimitPerRun": 100,
"emailTemplate": "Hi {{firstName}},\n\nI came across {{companyName}} — {{summarySnippet}}\n\nWould you have 15 minutes this week?\n\nJane\n\n---\n{{unsubscribeLink}}",
"emailSubject": "Quick question for {{companyName}}",
"skipOutreach": False,
"dryRunEmails": False,
"proxyConfiguration": {"useApifyProxy": True},
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "lead" and item.get("bestEmail"):
print(f"{item['agencyName']} | {item['bestEmail']} | {item['outreachStatus']} | {item['sizeSignal']}")
elif item.get("type") == "summary":
print(f"Run complete: {item['agenciesDiscovered']} discovered, "
f"{item['agenciesWithVerifiedEmail']} with verified email, "
f"{item['emailsSent']} emailed")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/agency-lead-gen-pipeline").call({
serviceCategory: "ppc",
location: "United Kingdom",
maxAgencies: 50,
directorySources: ["clutch", "designrush"],
senderName: "Alex at AdStack",
senderEmail: "[email protected]",
emailProvider: "sendgrid",
apiKey: "SG.YOUR_SENDGRID_API_KEY",
unsubscribeUrl: "https://adstack.io/unsubscribe?sid={{sequenceId}}&e={{email}}",
rateLimitPerRun: 100,
emailTemplate: "Hi {{firstName}},\n\nLove what {{companyName}} is doing in {{topService}}.\n\n{{summarySnippet}}\n\nWorth a quick call?\n\nAlex\n\n---\n{{unsubscribeLink}}",
emailSubject: "Quick question for {{companyName}}",
skipOutreach: false,
dryRunEmails: false,
proxyConfiguration: { useApifyProxy: true },
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.type === "lead" && item.bestEmail) {
console.log(`${item.agencyName} | ${item.bestEmail} | status: ${item.outreachStatus} | size: ${item.sizeSignal}`);
} else if (item.type === "summary") {
console.log(`${item.agenciesDiscovered} agencies → ${item.agenciesWithVerifiedEmail} verified → ${item.emailsSent} emailed`);
}
}
cURL
# Start the pipeline run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~agency-lead-gen-pipeline/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"serviceCategory": "content-marketing",
"location": "United States",
"maxAgencies": 25,
"directorySources": ["clutch", "sortlist"],
"senderName": "Jordan at ContentLab",
"senderEmail": "[email protected]",
"emailProvider": "smtp",
"smtpHost": "smtp-relay.brevo.com",
"smtpPort": 587,
"smtpUser": "[email protected]",
"smtpPass": "YOUR_SMTP_PASSWORD",
"unsubscribeUrl": "https://contentlab.co/unsubscribe?sid={{sequenceId}}&e={{email}}",
"rateLimitPerRun": 50,
"emailTemplate": "Hi {{firstName}},\n\nI came across {{companyName}} and noticed you specialize in {{topService}}.\n\n{{summarySnippet}}\n\nJordan\n\n---\n{{unsubscribeLink}}",
"emailSubject": "Quick question for {{companyName}}",
"skipOutreach": true,
"dryRunEmails": false,
"proxyConfiguration": {"useApifyProxy": true}
}'
# Fetch results once the run completes (replace DATASET_ID from run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Agency Lead Gen Pipeline works
Step 1: Agency directory scraping
The pipeline calls the agency-directory-scraper sub-actor with your serviceCategory, location, and a per-source agency cap calculated as ceil(maxAgencies / directorySources.length). The scraper hits Clutch, Sortlist, AgencySpotter, and DesignRush in parallel and returns structured records for each agency. This step is the only fatal step — if it fails or returns zero records, the pipeline exits with an error record rather than proceeding with empty data. After fetching results, agencies are capped at maxAgencies and deduplicated by normalized domain: when the same agency appears on multiple directories, the record with the highest review count is retained, and all source names are accumulated into directorySources.
Steps 2 and 3: Website content analysis and contact scraping
Steps 2 and 3 run sequentially on the unique list of agency website URLs extracted from Step 1. The website-content-analyzer fetches up to 4 pages per domain and extracts service tags, industry tags, named clients, size signals (solo/small/mid/large based on employee language detected on the site), tone keywords, and a summary snippet. This data populates the {{summarySnippet}} and {{topService}} template variables. The website-contact-scraper then crawls up to 5 pages per domain looking for email addresses and named contacts with titles. Both steps are non-fatal — if a subset of websites time out or return no content, the pipeline continues, and those agencies receive null for content fields.
Step 4: Email pattern detection and generation
For any domain that returned zero scraped emails in Step 3, the pipeline calls the email-pattern-finder sub-actor. This actor detects the naming convention used by the domain — such as {first}.{last}@domain.com or {first}@domain.com — by analyzing publicly visible email addresses associated with the domain, then generates candidate addresses by applying the detected pattern to contact names found in Step 3. Pattern confidence (0-1) is stored per domain. Domains that already have scraped emails skip this step entirely, keeping Step 4 targeted at coverage gaps only.
Step 5: Bulk email verification and best-email selection
All candidate emails — both scraped and pattern-generated — are deduplicated across all domains and passed to the bulk-email-verifier, which performs MX record lookups and SMTP handshakes at verificationLevel: standard. Results are indexed in a map keyed on lowercase email address. For each agency domain, the pickBestEmail function selects the highest-confidence deliverable address: it prefers non-catch-all valid emails over catch-all valid emails, and returns null if no verified address exists. This selection step minimizes bounce rates before any outreach begins.
Step 6: Personalized outreach sequencing
If skipOutreach is false and at least one lead has a verified email, the pipeline calls the outreach-sequencer with a structured leads payload. Each entry includes the best email, the contact's first name (split from contacts[0].name, falling back to the agency name or "there"), the company name, the top service, and the website summary snippet. The sequencer renders templates by substituting all {{variable}} placeholders — including the unsubscribe URL rendered with the sequence ID and email — then dispatches via your configured provider. Follow-up templates for day 3 and day 7 are included when provided and non-empty. The outreach status (sent, failed, deferred, dry-run) and sequence ID are merged back into each lead's output record in a final rebuild pass.
Tips for best results
-
Run with
skipOutreach: truefirst. Inspect the dataset for email quality,sizeSignaldistribution, andsummarySnippetcontent before sending any outreach. Filter out agencies that are too large or too small for your product before going live. -
Use SMTP over REST APIs for higher cold email deliverability. Providers like Brevo offer dedicated SMTP relay with strong sending reputation. Shared SendGrid free-tier IPs can land in spam for cold outreach. If using SendGrid, request a dedicated IP after warming up volume gradually.
-
Write subject lines that look handwritten. The best-performing cold email subjects read like they came from a human: "Quick question for Pinnacle Digital" outperforms "Partnership opportunity" or "Introducing AdStack". Avoid title case and promotional language.
-
Keep
maxAgenciesat 50 or under for weekly scheduled runs. Running the full pipeline on 50 agencies takes 20-40 minutes and produces a tight, high-quality list. Running on 500 agencies produces a longer list but with more variance in email quality and content analysis depth. -
Set day-3 and day-7 follow-ups for meaningful lift. Cold email sequences with 2-3 touches consistently outperform single-touch sends. Keep follow-ups short — 2-3 sentences — and reference the previous message without restating your entire pitch.
-
Filter by
sizeSignalbefore routing to your CRM. If your product suits mid-size agencies, filter the output tosizeSignal === "small"or"mid"before pushing to HubSpot. Pair this with HubSpot Lead Pusher to automate filtered routing after each run. -
Use the summary record to track pipeline performance. The
type: "summary"record appended to every dataset containsagenciesDiscovered,agenciesWithVerifiedEmail, andemailsSent. Pull this record via the API and insert it into your own reporting database to measure pipeline health over time. -
Combine with B2B Lead Qualifier for prioritization. After enrichment, pass the dataset through B2B Lead Qualifier to score leads 0-100 from 30+ signals. Route high-scoring leads to personal outreach and lower-scoring leads to the automated sequence.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| HubSpot Lead Pusher | After running with skipOutreach: true, push enriched leads from the dataset directly into HubSpot as contacts with company fields populated from enrichment data |
| Bulk Email Verifier | Run standalone verification on your existing prospect list before running outreach — combine with pipeline output to produce a merged, fully verified master list |
| Email Pattern Finder | Run on custom agency domain lists not covered by the four directories to generate and verify contact emails at $0.10/domain |
| Website Contact Scraper | Re-run contact scraping on agencies that returned no emails in the pipeline, with deeper crawl settings, to improve coverage before outreach |
| B2B Lead Qualifier | Score the enriched lead dataset from this pipeline (0-100 from 30+ signals) to prioritize which verified leads are worth personal outreach vs. automated sequence |
| Website Tech Stack Detector | Run on agency websites from the output dataset to detect which marketing and CRM tools each agency uses — useful for personalizing pitches around their existing stack |
| Waterfall Contact Enrichment | Run a 10-step enrichment cascade on leads where the pipeline returned bestEmail: null to maximize contact coverage through additional data sources |
Limitations
- Directory coverage is limited to four sources. Clutch, Sortlist, AgencySpotter, and DesignRush cover the largest English-language agency markets, but niche directories (Expertise.com, UpCity, Bark) are not included. For those, provide a pre-built URL list to Website Contact Scraper directly.
- JavaScript-heavy agency websites may not yield full content analysis. The website-content-analyzer uses HTTP-based parsing. Single-page applications that render entirely in JavaScript may return partial or no content data, leaving
servicesOffered,clientNames, andsummarySnippetnull for those agencies. For JS-heavy sites, try Website Contact Scraper Pro, which uses a full browser. - Email pattern detection requires at least one visible email on the domain. If an agency has no publicly visible email addresses anywhere on the web, the pattern finder cannot infer a naming convention. These agencies appear in the output with
emailPattern: nullandbestEmail: null. - Email verification cannot guarantee deliverability. MX and SMTP verification confirms address structure and mail server existence but cannot detect role-based accounts that will bounce, or inboxes behind catch-all configurations. The
catchAllflag indicates higher uncertainty and these addresses are ranked lower inbestEmailselection. - Outreach step is rate-limited per run. If
rateLimitPerRunis lower than the number of leads with verified emails, the remaining leads are processed in subsequent runs. There is no automatic batching across runs — control run timing via the Apify scheduler. - Follow-up delivery depends on the outreach-sequencer sub-actor. Day-3 and day-7 follow-ups are scheduled by the sequencer; their delivery is tracked via
sequenceIdin the Key-Value store. Review the sequencer's dataset for follow-up status. - Pipeline runtime scales with agency count. A 50-agency run takes 20-40 minutes. A 500-agency run can take 3-6 hours. Set
timeoutSecsin your API call accordingly, or use smaller batches on a schedule. - The pipeline does not track previously emailed agencies across runs. Use the output dataset as your source of truth and manage contact history in your CRM to avoid re-contacting agencies from prior runs.
- Agency directory data reflects listing date, not real-time state. Rating and employee count figures are as current as the directory's last update, which may be weeks old.
Integrations
- Zapier — trigger a pipeline run when a new prospect is added to a Zapier table, or push completed lead records to Slack, Notion, or Airtable when the run finishes
- Make — build a scenario that runs the pipeline on a schedule and routes leads with
outreachStatus: sentinto a CRM deal stage automatically - Google Sheets — export the lead dataset to a Google Sheet after each run for manual review and filtering before routing to outreach
- Apify API — trigger runs programmatically from your sales tooling, CRM, or internal dashboard and stream results into your own database
- Webhooks — receive a POST notification when the pipeline completes, including the dataset ID for immediate result fetching
- LangChain / LlamaIndex — feed the enriched lead dataset into an LLM agent that drafts custom first-line openers for each agency based on
summarySnippet,clientNames, andtoneKeywords
Troubleshooting
The pipeline exits at Step 1 with "No agencies found."
This typically means the serviceCategory value does not match the taxonomy used by the selected directories, or the location filter is too narrow. Try a broader category (e.g., digital-marketing instead of programmatic-advertising) or remove the location filter entirely. Check the Clutch and DesignRush category slugs to confirm the exact taxonomy match.
Agencies have bestEmail: null despite having a website.
Two causes are common. First, the contact page may load via JavaScript, which the HTTP-based contact scraper cannot execute — try Website Contact Scraper Pro for those domains. Second, the domain may use a catch-all mail server with no uniquely addressable named mailboxes, so the verifier returns valid: false for all candidates and bestEmail remains null.
Emails are showing as deferred in outreach results.
deferred means your email provider accepted the send request but queued it for later delivery, typically because the recipient's mail server issued a temporary refusal (4xx SMTP code). This is normal for 5-10% of addresses on any cold campaign. Check your provider's dashboard for detailed deferral reasons — retrying the same addresses is handled by your email provider's retry logic, not by re-running this pipeline.
The run is slow or timing out with large agency counts.
Batches over 100 agencies with all four directory sources and full content analysis can run for 2-4 hours. Increase the run timeout in your Apify console run settings, or reduce maxAgencies and run multiple batches on a schedule with different location filters.
Directory scraping returns far fewer agencies than maxAgencies.
Some service categories have sparse listings in certain geographies. The per-source cap is ceil(maxAgencies / directorySources.length), so a category with only 8 agencies on Sortlist for a given city returns 8 from that source regardless of your cap. Add more directory sources or broaden the location to increase yield.
Responsible use
- This actor only accesses publicly available agency directory listings and business websites.
- Respect each directory's terms of service and do not run at volumes that constitute denial-of-service.
- Comply with CAN-SPAM, CASL, GDPR, and other applicable laws when using extracted contact data for outreach. The
unsubscribeUrlfield is required precisely because CAN-SPAM mandates a working opt-out mechanism in every commercial email. - Do not use extracted contact data for spam, harassment, or any unauthorized purpose.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many agencies can the agency lead gen pipeline process in one run?
The maxAgencies parameter accepts values from 1 to 500. For first runs, 25-50 agencies is recommended to validate your targeting and template personalization before scaling. Larger batches of 200-500 agencies work well for broad market sweeps but take proportionally longer to complete and produce more variance in email coverage.
Does the agency lead gen pipeline send emails automatically?
By default, skipOutreach is set to true, meaning the pipeline runs all discovery and enrichment steps but does not send any emails. You must explicitly set skipOutreach: false and provide valid sending credentials to activate outreach. This prevents accidental sending during testing.
What agency directories does this pipeline cover?
The pipeline scrapes Clutch, Sortlist, AgencySpotter, and DesignRush. You can include any subset of these four using the directorySources parameter. All four are enabled by default, giving the broadest possible agency coverage per run.
How accurate is the agency lead gen pipeline email verification?
MX and SMTP verification confirms that the recipient domain's mail server exists and accepts mail for that address format. It cannot definitively confirm the mailbox is active without sending a message. Catch-all domains — where the server accepts any address — are flagged with catchAll: true and ranked lower in the bestEmail selection algorithm. Expect 5-15% bounce rates on cold outreach to verified business addresses, which is typical for the category.
Can I use the agency lead gen pipeline for enrichment only, without any outreach?
Yes. Set skipOutreach: true to run all six steps and output a fully enriched lead dataset without sending a single email. Export to CSV, push to HubSpot via HubSpot Lead Pusher, or process the data in your own pipeline.
How is this pipeline different from Apollo.io or ZoomInfo for finding agency contacts?
Apollo and ZoomInfo pull from static databases that may be months or years out of date. This pipeline scrapes agency directories and agency websites live at run time, so data reflects the agency's current services, team, and published contacts. It costs $0.25 per pipeline run versus $49-$99/month for Apollo or $15,000+/year for ZoomInfo, and it produces personalization data — summary snippets, tone keywords, client names — that static database tools do not offer.
Can I schedule the agency lead gen pipeline to run weekly?
Yes. Use Apify's built-in scheduler to trigger runs on a cron schedule. Weekly runs with the same serviceCategory and location discover newly listed agencies each cycle. Use skipOutreach: false with a low rateLimitPerRun to drip outreach automatically, or combine with HubSpot Lead Pusher to route new leads to a review queue before sending.
Is it legal to scrape agency directories and send cold emails to businesses?
Scraping publicly listed business information is generally permitted under US law, consistent with established precedent for public web data. Sending cold B2B email is legal under CAN-SPAM provided each email includes a working unsubscribe mechanism, an accurate From address, and a physical postal address in your email footer. This pipeline enforces the unsubscribe URL as a required field. GDPR applies to recipients in the EU — document your legal basis for processing if targeting European agencies. For full guidance, see Apify's web scraping legal guide.
What happens if one pipeline step fails mid-run?
Step 1 (agency directory scraping) is fatal — if it fails, the pipeline exits and pushes an error record. You are not charged the $0.25 PPE fee for runs that fail at Step 1. Steps 2-6 are non-fatal: if website analysis, contact scraping, pattern finding, verification, or outreach fail for some domains, the pipeline continues and outputs leads with null for fields from the failed step.
How do I personalize emails using agency website content?
The {{summarySnippet}} variable in your email template is populated from the website content analysis step — it is a short excerpt drawn from the agency's own website. Combined with {{topService}} (the first detected service from their site) and {{firstName}} (split from the contact name), each email references the specific agency without manual customization. Use dryRunEmails: true on a small batch to verify the personalization looks natural before sending at scale.
Can I run this pipeline against a custom list of agency domains rather than directory searches?
Not directly — Step 1 always performs directory scraping. For custom domain lists, run Website Contact Scraper and Email Pattern Finder individually, then feed the results into Bulk Email Verifier. This gives you the same enrichment and verification capability for an arbitrary domain list without the directory discovery step.
How long does a typical agency lead gen pipeline run take?
A run processing 25-50 agencies across all 4 directories with full website analysis, contact scraping, verification, and outreach typically completes in 20-45 minutes. Runs with 100-200 agencies take 60-120 minutes. The default actor timeout is 4 hours (14,400 seconds), which accommodates runs up to approximately 400 agencies.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Weather Forecast Search
Get weather forecasts for any location worldwide using the free Open-Meteo API. Returns current conditions, daily and hourly forecasts with temperature, precipitation, wind, UV index, and more. No API key needed.
EUIPO EU Trademark Search
Search EU trademarks via official EUIPO database. Find registered and pending trademarks by name, Nice class, applicant, or status. Returns full trademark details and filing history.
Nominatim Address Geocoder
Geocode addresses to GPS coordinates and reverse geocode coordinates to addresses using OpenStreetMap Nominatim. Batch geocoding with rate limiting. Free, no API key needed.
Ready to try Agency Lead Gen Pipeline?
Start for free on Apify. No credit card required.
Open on Apify Store