How to Scrape Podcast Directories for B2B Leads

Every week, thousands of founders, VPs, and department heads go on podcasts and voluntarily share their company name, job title, and website URL. It's public, it's structured, and almost nobody in B2B sales is using it.

Podcast directories like Apple Podcasts hold structured metadata for over 4.2 million shows globally, according to Listen Notes' 2025 podcast statistics. That metadata includes host names, owner email addresses, website links, RSS feed URLs, episode frequency, and — critically — guest information embedded in show notes. If you're doing B2B lead generation, this is an overlooked goldmine sitting right next to the overfarmed fields of LinkedIn and Apollo.

I built ApifyForge's Podcast Directory Scraper specifically because I kept seeing this gap. People search Apple Podcasts for keywords in their industry, find podcast hosts and guests who match their ICP, and then... manually copy-paste everything into a spreadsheet. That's insane for any list over 20 shows.

Here's how the automated version works, and why it feeds into a broader enrichment pipeline that turns raw podcast data into qualified B2B contacts.

What data can you extract from podcast directories?

Podcast directories store structured metadata about every listed show, including the host's name, owner email address, company website URL, RSS feed link, episode count, publishing frequency, genre categories, and active/inactive status. Apple Podcasts alone indexes over 4.2 million shows with this data publicly accessible.

That's the short version. The longer version is more interesting.

When someone submits a podcast to Apple Podcasts, they fill in a bunch of fields that most people don't think of as "lead data." But it is. The owner email is usually a real person's address — not a noreply@. The website URL points to the company or personal brand behind the show. The show description often names the host's title and company explicitly. And if the podcast has guests (most B2B podcasts do), the episode descriptions frequently include the guest's full name, company, title, and sometimes even a direct link to their LinkedIn profile.

What ApifyForge's Podcast Directory Scraper pulls from Apple Podcasts for each show:

Host/owner name and email address
Company website URL
RSS feed URL (which often contains even more metadata)
Total episode count and publishing frequency
Genre and category tags
Active vs. inactive status
Last episode publish date

The RSS feed alone is worth the scrape. According to Podcast Index, RSS feeds for business podcasts contain structured <itunes:owner> and <itunes:author> tags with contact details that don't appear in the Apple Podcasts UI. That's data most people browsing the directory never even see.

Why are podcast guests good B2B leads?

Podcast guests are strong B2B leads because they've publicly demonstrated expertise, willingness to engage with new audiences, and active involvement in their industry. A 2024 study by Edison Research found that 47% of monthly podcast listeners in the US are in the 18-44 age bracket with above-median household income — the exact demographic making B2B purchasing decisions.

But the real reason podcast guests convert well isn't demographic. It's behavioral.

Someone who goes on a podcast is already doing three things that matter for outbound sales: they're building their personal brand, they're open to conversations with strangers, and they've implicitly said "yes, I want more visibility." That's a fundamentally different prospect than someone you cold-scraped off LinkedIn who hasn't posted in 8 months.

I've talked to SDR teams who run podcast-based outbound and they consistently report 2-3x higher reply rates compared to traditional cold email lists. The opener writes itself: "Hey, I caught your episode on [Podcast Name] about [Topic]. You mentioned [specific thing]..." That's not a template. It's a real conversation starter with a specific hook.

There's a second angle here too. Podcast hosts are leads themselves. If someone runs a podcast about, say, supply chain management, they're either (a) a consultant selling supply chain services, (b) an executive at a supply chain company, or (c) a media person who can introduce you to both. Either way, they're worth talking to.

How to extract company websites from podcast episodes

The process for extracting company website URLs from podcast directories involves three steps: searching for shows by industry keyword, scraping the show metadata including website fields, and following RSS feeds to extract additional contact details from episode-level data.

Here's the practical workflow I use and what I recommend to anyone building this pipeline.

Step 1: Keyword search across Apple Podcasts. Use industry-specific search terms, not generic ones. "Supply chain logistics" beats "business." "Fintech founders" beats "startups." The more specific your keyword, the higher the ICP match rate in the results. ApifyForge's Podcast Directory Scraper accepts multiple search terms and deduplicates results across them, so you can run 5-10 keyword variations in a single scrape.

Step 2: Filter for active shows. A podcast that hasn't published in 6+ months is probably dead. The people behind it may have moved on, changed companies, or shut down the business. The scraper returns last-publish dates and episode frequency, so you can filter for shows that published within the last 90 days. According to Podnews, only about 440,000 podcasts published an episode in the last 7 days out of the 4.2 million total — so active status is a real signal.

Step 3: Follow the website URL. This is where it gets good. Once you have the company website from the podcast metadata, you feed those URLs into a contact scraper to get emails, phone numbers, and team pages. I built Website Contact Scraper for exactly this — give it a list of domains and it returns structured contact data. We go deeper into how that works in the contact scraper comparison page.

What's the difference between podcast scraping and LinkedIn scraping for leads?

Podcast directory scraping extracts data from publicly listed show metadata on platforms like Apple Podcasts, which is published voluntarily by show owners. LinkedIn scraping operates against LinkedIn's Terms of Service and risks account bans. Podcast data carries lower legal risk and higher response rates for outbound.

That's the featured-snippet answer. The more honest answer has some texture.

LinkedIn is where everyone goes for B2B leads. Which is exactly the problem. Every VP of Sales at a mid-market SaaS company gets 40+ cold LinkedIn messages per week, according to Lavender's 2024 Cold Email Benchmark Report. The channel is saturated. Response rates on LinkedIn InMail dropped to 1.6% average in 2024, down from 2.3% in 2022.

Podcast directories are the opposite. Almost nobody is mining them for leads at scale. The data quality is higher because it's self-reported by show owners (not scraped from a stale database), and the outreach angle is warmer because you can reference specific episodes.

There are tradeoffs though. LinkedIn gives you a bigger universe — 1 billion profiles vs. 4.2 million podcasts. And LinkedIn data includes job title and company directly in structured fields, while podcast data sometimes requires a second enrichment step to get from "website URL" to "VP of Engineering at Acme Corp." That's where tools like Email Pattern Finder and Waterfall Contact Enrichment come in — they bridge the gap between "I have a domain" and "I have a named contact with a verified email."

| Factor | Podcast Directories | LinkedIn | |--------|-------------------|----------| | Data freshness | Real-time (scraped live) | Days to months stale | | Legal risk | Low (public RSS/API data) | High (ToS violations) | | Contact availability | Website + owner email | Direct profile data | | Outreach angle | Episode-specific hooks | Generic connection request | | Response rate | 5-8% (reported by SDR teams) | 1.6% average InMail | | Universe size | ~4.2M shows | ~1B profiles | | Enrichment needed | Yes (domain to contact) | Minimal |

How to build a podcast-to-lead enrichment pipeline

The real power isn't in the podcast scrape alone. It's in what you do with that data next. Here's the pipeline I've built and that ApifyForge users run every week.

Podcast Directory Scraper pulls show metadata by keyword from Apple Podcasts. You get owner emails, website URLs, and show details. Cost is about $0.10-0.15 per podcast scraped on Apify's pay-per-event pricing — so scraping 500 shows runs you $50-75.

Feed the website URLs into Website Contact Scraper. It crawls each domain and extracts emails, phone numbers, team member names, and job titles. I covered how that extraction engine works in my 11,000 runs deep dive. It uses CheerioCrawler with 19 contact-page path patterns, three extraction strategies for emails/phones/names, and 13 junk-email filters. Success rate sits at 99.8%.

For contacts where Website Contact Scraper doesn't find a direct email (smaller companies without a team page, for example), run the domains through Email Pattern Finder. It determines the email pattern for a company (first.last@, firstlast@, first@, etc.) and generates likely email addresses for known contacts.

For maximum coverage, Waterfall Contact Enrichment runs multiple enrichment strategies in sequence — it tries each data source in order and stops when it gets a hit. Think of it as a fallback chain: if source A doesn't have the email, try source B, then source C.

The end result: you started with a keyword like "ecommerce logistics" and ended up with a list of podcast hosts and guests, their company names, verified email addresses, phone numbers, and job titles. All from public data, no LinkedIn required.

Does podcast scraping actually work at scale?

I'll be direct: it works well for targeted B2B outreach, not for building massive cold email lists. If you need 100,000 generic leads, this isn't the right channel. If you need 200-500 highly qualified prospects in a specific niche, podcast directories are one of the best sources I've found.

The math works like this. A keyword search for "B2B SaaS" on Apple Podcasts returns roughly 300-500 active shows. Each show has at least one host, most have a website, and many have identifiable guests across their episode archive. Run that through the enrichment pipeline above and you're looking at 400-1,500 named contacts with company websites and emails. Per keyword.

Stack 5-10 keywords and you've got a pipeline of 2,000-15,000 contacts that are genuinely relevant to your market. That's not spray-and-pray volume. That's a targeted list where every contact has publicly demonstrated interest in your industry by either hosting or appearing on a podcast about it.

ApifyForge tracks over 300 web scraping actors and 93 MCP intelligence servers — I've seen a lot of lead gen approaches. The podcast angle is underused relative to its conversion potential. The data is fresh, it's public, and the outreach hooks are built into the content itself.

For teams doing account-based marketing (ABM), there's another angle worth mentioning. If you're targeting specific companies, search for podcasts where their executives have appeared as guests. Our B2B Lead Qualifier can score these leads against your ICP criteria automatically — company size, industry, geography, tech stack.

What about legal and compliance considerations?

Podcast metadata on Apple Podcasts and in RSS feeds is published publicly by show owners for the explicit purpose of being discovered. Scraping this data for business use falls under publicly available information, which is generally permissible under US law following the hiQ Labs v. LinkedIn ruling where the Ninth Circuit held that scraping public data does not violate the Computer Fraud and Abuse Act.

That said, I'm not a lawyer, and you should think about a few things.

GDPR applies if you're collecting data on EU residents — even from public sources. Article 6(1)(f) allows processing based on legitimate interest, but you still need to be able to justify the processing and honor data subject access requests. The UK ICO published guidance on legitimate interest assessments that's worth reading if you're doing this at scale.

CAN-SPAM in the US is more straightforward: you can email anyone for commercial purposes as long as you include an opt-out mechanism, honor unsubscribe requests within 10 business days, and don't use deceptive headers or subject lines. The podcast data itself isn't the compliance concern — it's what you do with it afterward.

For compliance-heavy industries, ApifyForge offers compliance screening tools and MCP servers like the Counterparty Due Diligence MCP that can screen your lead lists against sanctions databases and adverse media before outreach.

Getting started

The fastest path from "I want podcast leads" to "I have a qualified list":

Pick 3-5 industry keywords that match your ICP
Run Podcast Directory Scraper with those keywords
Filter results for active shows (published in last 90 days)
Feed the website URLs into Website Contact Scraper for emails and team data
Run gaps through Email Pattern Finder or Waterfall Contact Enrichment
Personalize outreach with episode-specific references

You need an Apify account to run the actors. ApifyForge's cost calculator can estimate what a full pipeline run costs based on your keyword count and expected result volume. All actors use pay-per-event pricing, so you only pay when you get results — no monthly subscriptions or unused credits.

Podcast directories won't replace LinkedIn for raw volume. But for targeted, high-reply-rate outbound to people who are actively building their public presence? It's one of the better channels I've seen. And almost nobody's running it at scale yet.