SourceForge & TrustRadius — Software Vendor Leads is an Apify actor on ApifyForge. Scrapes SourceForge and TrustRadius for software company leads by category. Returns vendor name, website, rating, review count, pricing tier, and category tags. Filter by rating or review count. $0.05 per company. It costs $0.05 per company-found. Best for sales teams and marketers who need verified contact data, lead lists, or prospect enrichment at scale. Not ideal for real-time monitoring or historical data analysis. Maintenance pulse: 90/100. Last verified March 27, 2026. Built by Ryan Clinton (ryanclinton on Apify).
SourceForge & TrustRadius — Software Vendor Leads
SourceForge & TrustRadius — Software Vendor Leads is an Apify actor available on ApifyForge at $0.05 per company-found. Scrapes SourceForge and TrustRadius for software company leads by category. Returns vendor name, website, rating, review count, pricing tier, and category tags. Filter by rating or review count. $0.05 per company.
Best for sales teams and marketers who need verified contact data, lead lists, or prospect enrichment at scale.
Not ideal for real-time monitoring or historical data analysis.
What to know
- Results depend on publicly available data; private or gated contacts may not be found.
- Email verification accuracy varies by domain and provider policies.
- Requires an Apify account — free tier available with limited monthly usage.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| company-found | Charged for each unique software company extracted from SourceForge or TrustRadius that passes all quality filters. | $0.05 |
Example: 100 events = $5.00 · 1,000 events = $50.00
Documentation
Software directory scraper that extracts software company leads from SourceForge and TrustRadius by category. Point it at any category slug — crm, project-management, email-marketing — and it returns structured records with vendor name, website, star rating, review count, pricing tier, and category tags. Built for sales teams, marketing agencies, and SaaS founders who need targeted lists of software companies without manual browsing.
The actor runs on CheerioCrawler, which means it is fast, lightweight, and requires no proxies — SourceForge and TrustRadius serve their listing pages to datacenter IPs without blocking. Both sources are scraped simultaneously, results are deduplicated by domain, and quality filters let you exclude low-review or low-rated products before they reach your dataset.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| 🏢 Company Name | SourceForge / TrustRadius | Pinnacle CRM Technologies |
| 📦 Product Name | SourceForge / TrustRadius | PinnacleCRM Pro |
| 🌐 Website URL | SourceForge / TrustRadius | https://pinnaclecrm.io |
| 🔗 Domain | Extracted from website | pinnaclecrm.io |
| 📋 Directory Profile URL | SourceForge / TrustRadius | https://sourceforge.net/software/product/pinnaclecrm/ |
| ⭐ Rating | SourceForge / TrustRadius | 4.3 (unified 1–5 scale) |
| 💬 Review Count | SourceForge / TrustRadius | 1,842 |
| 💰 Pricing Tier | SourceForge / TrustRadius | $29/month |
| 🏷️ Categories | SourceForge / TrustRadius | ["crm", "Sales Force Automation"] |
| 🏆 Badges | SourceForge only | ["Leader", "Top Performer"] |
| 📝 Description | SourceForge / TrustRadius | Cloud-based CRM for SMB sales teams... |
| 🗂️ Source | Actor metadata | sourceforge |
Why use Software Directory Scraper?
Building a list of software companies in a niche by hand means clicking through dozens of directory pages, copying names into a spreadsheet, and Googling websites one by one. For a single category on SourceForge you might spend two hours to collect 50 companies — with no rating data, no pricing context, and no structured output.
This actor automates the entire process. Provide a list of category slugs and it crawls SourceForge pagination (page-by-page using ?page=N) and TrustRadius product sitemaps (fetching from product-reviews-sitemap-1.xml through sitemap 5, giving 2,500+ product URLs per crawl). Every record is cleaned, normalized, and written to the Apify dataset in minutes.
- Scheduling — run weekly to refresh your list of active software vendors as new products are added to directories
- API access — trigger runs from Python, JavaScript, or any HTTP client and pipe results directly into your CRM or enrichment pipeline
- Proxy rotation — proxies are not required for these sources, but the actor accepts an optional proxy configuration for custom setups
- Monitoring — get Slack or email alerts when runs fail or produce fewer results than expected via Apify's built-in monitoring
- Integrations — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks without writing a line of code
Features
- Dual-source scraping — crawls both SourceForge (100k+ products, paginated category listings) and TrustRadius (B2B-focused, Next.js SSR pages) in a single run, configurable per source
- Automatic pagination on SourceForge — follows
?page=Nlinks until the per-category limit is reached or no more listing cards are found - TrustRadius sitemap discovery — parses
product-reviews-sitemap-{1..5}.xmlfiles to collect product URLs, then fetches each product page individually for structured data - Dual-extraction strategy for TrustRadius — first attempts to parse the
__NEXT_DATA__JSON embedded in the server-rendered page across three property paths (pageProps.product,pageProps.data.product,pageProps.productReviews.product), then falls back to eight nameddata-testidHTML selectors - Unified rating scale — TrustRadius uses a 10-point scoring system; the actor converts all scores to a 1–5 scale using
Math.round((val / 2) * 10) / 10so ratings from both sources are directly comparable - Domain deduplication — strips
www.prefixes, normalizes to registrable domain, and tracks seen domains in a sharedSet<string>across all categories and sources to prevent duplicate vendor rows - Per-category per-source limits — the
maxCompaniesPerCategorylimit applies independently to each{source}:{category}pair, so a limit of 50 means up to 50 from SourceForge CRM and 50 from TrustRadius CRM - Quality filters —
minReviewsandminRatingfilters are applied after extraction and before charging; products that fail are logged and skipped at no cost - Pricing normalization — raw pricing strings are mapped to standard tiers:
Free,Freemium,Open Source,Contact Vendor, or a cleaned price string like$29/month - Badge extraction from SourceForge — scrapes award badges from
.badge-container .badgeand[class*="award"]elements, useful for identifying "Leader" and "Top Performer" products - Resilient SourceForge selectors — uses four CSS selector strategies (
[class*="project-cell"],.sf-project-listing-item,ul.projects-listing > li,.inner-cell) with a filter for elements containing anh3 atitle link, ensuring coverage across markup changes - Pay-per-event billing — charged $0.05 per company that passes quality filters; the actor stops automatically when your spending limit is reached and data is always pushed before the charge fires
- Run summary record — every run ends with a
type: "summary"record showing totals by category and source, useful for monitoring and pipeline auditing
Use cases for software directory scraping
Sales prospecting for SaaS tools
Sales development reps building outbound lists can use this actor to find every CRM, help-desk, or marketing-automation vendor in a category. With website and domain data in the output, results feed directly into Website Contact Scraper to find decision-maker emails, or into Waterfall Contact Enrichment for a full contact cascade. A list of 200 CRM vendors takes under 10 minutes and costs $10.
Marketing agency lead generation
Agencies that serve software companies — design studios, content agencies, SEO firms — can scrape target categories to find prospect companies with their websites pre-extracted. Filter by minReviews: 10 to exclude unestablished products and focus on vendors that are already investing in their market presence. Rating data helps prioritize outreach toward well-reviewed products that likely have marketing budgets.
Competitive intelligence and market mapping
Founders and product managers can scrape their own category to map the competitive landscape. The output includes category tags, pricing tiers, and badge data, giving a structured view of which products lead the category. Combine with Website Tech Stack Detector to identify which technology platforms your competitors are built on.
Data enrichment for existing company lists
If you already have a list of software company domains, run this actor to add rating, review count, pricing tier, and category context from SourceForge and TrustRadius. The domain field enables joining with your existing data. Set deduplicateByDomain: false when you want complete coverage across multiple categories for the same company.
Recruiting and talent sourcing
Recruiters targeting software companies in specific verticals can use category data to find employers. A search for project-management or hr-software returns companies with their websites, which feed into contact extraction to find hiring manager contacts. The badge data (Leader, Top Performer) helps identify fast-growing companies likely to be actively hiring.
B2B lead qualification and scoring
The combination of rating, review count, and pricing tier gives enough signal to score leads before enrichment. High-rating, high-review-count companies with paid pricing tiers are indicators of an established, revenue-generating business. Pipe the output into B2B Lead Qualifier to apply a formal 0–100 score before committing to enrichment cost.
How to scrape software company leads from SourceForge and TrustRadius
- Enter your target categories — Type the category slugs you want to scrape. Use lowercase, hyphenated slugs that match the directory URL:
crm,project-management,email-marketing,accounting,help-desk. You can enter multiple categories in one run. - Configure quality filters — Set
minReviewsto 5 or 10 to exclude newly listed products with no track record. SetminRatingto 3.5 to focus on well-reviewed vendors. Leave both at 0 to collect everything. - Run the actor — Click "Start" and wait. A single category with the default limit of 50 companies per source typically completes in 3–5 minutes.
- Download results — Open the Dataset tab, then export to JSON, CSV, or Excel. The dataset includes one row per company plus a summary record at the end showing totals by category and source.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
categories | array | Yes | ["crm"] | Category slugs to scrape. Use lowercase hyphenated slugs matching directory URLs (e.g. crm, project-management, email-marketing). |
sources | array | No | ["sourceforge", "trustradius"] | Which directories to scrape. Options: sourceforge, trustradius. Omit to scrape both. |
maxCompaniesPerCategory | integer | No | 50 | Max companies per category per source. 0 = no limit. Range: 0–1000. |
minReviews | integer | No | 0 | Minimum number of reviews a product must have to be included. |
minRating | number | No | 0 | Minimum average rating (1.0–5.0 scale) a product must have. |
deduplicateByDomain | boolean | No | true | Remove duplicate companies when the same domain appears across categories or sources. |
proxyConfiguration | object | No | none | Optional Apify proxy configuration. SourceForge and TrustRadius work without proxies. |
Input examples
Standard scrape — two categories, both sources:
{
"categories": ["crm", "project-management"],
"sources": ["sourceforge", "trustradius"],
"maxCompaniesPerCategory": 50,
"minReviews": 5,
"minRating": 3.5,
"deduplicateByDomain": true
}
Large batch — five categories, SourceForge only, higher limit:
{
"categories": ["crm", "project-management", "email-marketing", "accounting", "help-desk"],
"sources": ["sourceforge"],
"maxCompaniesPerCategory": 200,
"minReviews": 0,
"minRating": 0,
"deduplicateByDomain": true
}
Quick test — one category, minimal filters:
{
"categories": ["crm"],
"sources": ["sourceforge"],
"maxCompaniesPerCategory": 10,
"deduplicateByDomain": false
}
Input tips
- Start with a small limit — set
maxCompaniesPerCategory: 10for a first run to verify results match your expectations before scaling up. - Use both sources together — SourceForge skews toward SMB and open-source tools; TrustRadius skews toward enterprise B2B. Combined, you get broader coverage of a category.
- Category slugs must match the SourceForge URL — verify by visiting
https://sourceforge.net/software/{your-slug}/in a browser before running. - Batch multiple categories in one run — processing 5 categories in a single run is more efficient than 5 separate runs, because the deduplication set is shared across the entire run.
- Set a spending limit — use Apify's per-run budget control to cap costs before running against a large category list.
Output example
{
"companyName": "Pinnacle CRM Technologies",
"productName": "PinnacleCRM Pro",
"website": "https://pinnaclecrm.io",
"domain": "pinnaclecrm.io",
"profileUrl": "https://sourceforge.net/software/product/pinnaclecrm/",
"rating": 4.3,
"reviewCount": 1842,
"pricingTier": "$29/month",
"categories": ["crm", "Sales Force Automation", "Contact Management"],
"badges": ["Leader", "Top Performer Q1 2025"],
"description": "Cloud-based CRM for SMB sales teams. Includes pipeline management, email sequences, and native Slack integration. Free 14-day trial.",
"source": "sourceforge",
"sourceCategory": "crm",
"scrapedAt": "2026-03-22T09:14:32.451Z"
}
The final record in every dataset is a summary record:
{
"type": "summary",
"categoriesScraped": ["crm", "project-management"],
"sourcesUsed": ["sourceforge", "trustradius"],
"totalCompaniesFound": 187,
"totalDeduplicated": 14,
"companiesByCategory": {
"crm": 98,
"project-management": 89
},
"companiesBySource": {
"sourceforge": 94,
"trustradius": 93
},
"scrapedAt": "2026-03-22T09:21:08.772Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
companyName | string | null | Vendor or company name. Falls back to product name when the directory does not list the vendor separately. |
productName | string | null | Software product name as listed in the directory. |
website | string | null | Vendor website URL as listed in the directory profile. |
domain | string | null | Registrable domain extracted from the website URL (e.g. pinnaclecrm.io). Used for deduplication and CRM join keys. |
profileUrl | string | null | Direct link to the product's SourceForge or TrustRadius profile page. |
rating | number | null | Average rating on a unified 1.0–5.0 scale. TrustRadius 10-point scores are divided by 2 and rounded to one decimal. |
reviewCount | number | null | Total number of user reviews or ratings in the directory. |
pricingTier | string | null | Normalized pricing: Free, Freemium, Open Source, Contact Vendor, or a price string like $29/month. |
categories | string[] | Category tags from the listing. Always includes the source category slug used to discover the product. |
badges | string[] | SourceForge award badges (e.g. Leader, Top Performer). Empty array for TrustRadius results. |
description | string | null | Short product description from the listing page. |
source | string | Which directory this record came from: sourceforge or trustradius. |
sourceCategory | string | The category slug used to discover this company (e.g. crm). |
scrapedAt | string | ISO 8601 timestamp when this record was extracted. |
How much does it cost to scrape software company leads?
Software Directory Scraper uses pay-per-event pricing — you pay $0.05 per company extracted. Platform compute costs are included. Companies filtered out by minReviews or minRating are not charged.
| Scenario | Companies | Cost per company | Total cost |
|---|---|---|---|
| Quick test | 10 | $0.05 | $0.50 |
| Single category | 50 | $0.05 | $2.50 |
| Two categories, both sources | 200 | $0.05 | $10.00 |
| Five categories | 500 | $0.05 | $25.00 |
| Full market map | 1,000 | $0.05 | $50.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached, so a $5 limit will collect up to 100 companies.
Compare this to manually browsing directories at roughly 30 seconds per company — 200 companies would take 100 minutes of manual work. At $10 for the same output, you get clean structured data with no subscription commitment. Tools like ZoomInfo or Apollo charge $100–500/month and still require manual filtering to narrow to a specific software category.
Scrape software company leads using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/g2-company-scraper").call(run_input={
"categories": ["crm", "project-management"],
"sources": ["sourceforge", "trustradius"],
"maxCompaniesPerCategory": 50,
"minReviews": 5,
"minRating": 3.5,
"deduplicateByDomain": True,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "summary":
print(f"Total companies found: {item['totalCompaniesFound']}")
else:
print(f"{item['productName']} ({item['companyName']}) — {item['domain']} — {item['rating']} stars, {item['reviewCount']} reviews")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/g2-company-scraper").call({
categories: ["crm", "project-management"],
sources: ["sourceforge", "trustradius"],
maxCompaniesPerCategory: 50,
minReviews: 5,
minRating: 3.5,
deduplicateByDomain: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.type === "summary") {
console.log(`Total: ${item.totalCompaniesFound} companies`);
} else {
console.log(`${item.productName} — ${item.domain} — ${item.rating} stars, ${item.pricingTier}`);
}
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~g2-company-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"categories": ["crm", "project-management"],
"sources": ["sourceforge", "trustradius"],
"maxCompaniesPerCategory": 50,
"minReviews": 5,
"minRating": 3.5,
"deduplicateByDomain": true
}'
# Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Software Directory Scraper works
Phase 1 — Request generation
On startup the actor reads your categories and sources inputs, normalizes category slugs (lowercase, hyphens, trim), then builds start requests. For SourceForge, it constructs one URL per category at https://sourceforge.net/software/{category}/ (page 1). For TrustRadius, it generates five sitemap URLs per category — https://www.trustradius.com/sitemaps/product-reviews-sitemap-{1..5}.xml — to maximize product URL discovery. All requests are handed to a CheerioCrawler instance running at maxConcurrency: 5 with session pooling, persistent cookies per session, a 30-second navigation timeout, and 3 retries per request.
Phase 2 — SourceForge category crawling
The SourceForge route handler (SF_CATEGORY) parses category listing pages using four CSS selector strategies for resilience against markup changes. It extracts product name from h3 a title links, vendor name from .project-company or [class*="company"] elements, star rating from [class*="rating-avg"] or [itemprop="ratingValue"] attributes, review count from [class*="rating-count"], pricing from [class*="price"], category tags from [class*="tag"] a, award badges via the extractSFBadges helper, and the vendor website link identified by data-ga-label="website" or rel="nofollow" links pointing outside sourceforge.net. After processing each page it automatically enqueues the next page (?page=N+1) until the per-category limit is reached or no listing cards are found.
Phase 3 — TrustRadius sitemap and product crawling
The TrustRadius route has two stages. The sitemap handler (TR_SITEMAP) parses XML sitemap files using Cheerio's XML support (enabled via additionalMimeTypes: ['application/xml', 'text/xml']), filters for /products/ URLs that are not comparison, pricing, video, or competitor pages, then enqueues up to targetCount * 2 product URLs to account for items that will be filtered. The product handler (TR_PRODUCT) first attempts to parse structured data from the __NEXT_DATA__ JSON block embedded in the server-rendered page, checking three property paths. If the JSON path yields no product name, it falls back to eight named data-testid HTML selectors — product-name, vendor-name, overall-score, reviews-count, product-description, pricing-summary, category, and vendor-website — plus a meta[name="description"] fallback for descriptions.
Phase 4 — Normalization, filtering, and pay-per-event charging
Every extracted record passes through transformRawToClean(), which applies domain extraction (stripping www. prefixes via URL parsing), rating scale conversion, pricing tier normalization, and whitespace collapsing via regex. The record is then checked against minReviews and minRating in passesFilters(). If it passes, the actor calls Actor.pushData() first, then Actor.charge({ eventName: 'company-found', count: 1 }) — following Apify's data-before-charge rule. The eventChargeLimitReached flag is checked after each charge; if set, all active route handlers stop and the run completes cleanly with a summary record.
Tips for best results
-
Check category slugs against the SourceForge URL before running. Visit
https://sourceforge.net/software/your-slug/in a browser. If the page returns results, the slug is valid. Invalid slugs return empty pages and produce zero results for the SourceForge source. -
Use
minReviews: 5as a baseline filter. Products with fewer than 5 reviews are often newly listed or inactive. Filtering them reduces noise without significantly reducing volume in established categories. -
Combine categories strategically to avoid redundancy. Categories on SourceForge overlap significantly —
crmandsales-force-automationshare many products. Running them together withdeduplicateByDomain: truecatches products in both without doubling your cost. -
Run TrustRadius-only for enterprise B2B focus. TrustRadius skews heavily toward enterprise software with large review counts and detailed scoring. If your target market is enterprise buyers, set
sources: ["trustradius"]andminReviews: 20for a focused list. -
Pipe directly into contact enrichment. The
domainfield is a ready-made key for Website Contact Scraper. Extract the domains from your dataset and run them in a batch to get email addresses and phone numbers for each vendor. -
Schedule weekly refreshes for fast-moving categories. Categories like
ai-toolsormarketing-automationadd new products frequently. A weekly scheduled run keeps your lead list current as new vendors appear in the directories. -
Use the summary record for run monitoring. Every run ends with a
type: "summary"record. IftotalCompaniesFounddrops significantly week-over-week, that signals a markup change or category rename worth investigating before the next run.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Contact Scraper | Feed the domain or website field from each company record to extract emails, phone numbers, and contact pages from vendor websites |
| Waterfall Contact Enrichment | Run a 10-step enrichment cascade on the vendor domain to find decision-maker emails and LinkedIn profiles |
| Email Pattern Finder | Detect the email naming convention used by each vendor (e.g. [email protected]) before building outbound sequences |
| B2B Lead Qualifier | Score each company 0–100 using rating, review count, pricing tier, and other signals to prioritize enrichment spend |
| Website Tech Stack Detector | Detect 100+ web technologies on each vendor's website to qualify leads by tech profile or integration fit |
| HubSpot Lead Pusher | Push the structured company records directly into HubSpot as contacts or companies after enrichment |
| Bulk Email Verifier | Verify email addresses found via contact scraping before importing into your sending tool |
| Lead Enrichment Pipeline | All-in-one Clay alternative: email discovery, verification, company research, and scoring in one run ($0.12/lead) |
| AI Outreach Personalizer | Generate personalized cold emails using your own OpenAI/Anthropic key — zero AI markup ($0.01/lead) |
| Intent Signal Tracker | Track buying signals: hiring, tech changes, funding, content updates. Prioritize outreach by intent score ($0.05/company) |
| Lead Data Quality Auditor | Audit lead data quality before outreach — email verification, phone validation, domain freshness ($0.005/record) |
Limitations
- TrustRadius category filtering is approximate. The actor discovers TrustRadius products via sitemaps that list all products, not filtered by category. The
sourceCategoryfield reflects the category you searched for, not a TrustRadius taxonomy match. Products from adjacent segments may appear in results. - SourceForge badge extraction depends on CSS class naming patterns. Badges are extracted using selectors like
[class*="badge"]and[class*="award"]. If SourceForge changes its CSS class naming, badge data may be incomplete while other fields remain accurate. - Vendor websites are not always present. Some directory listings do not include a vendor website link. In those cases
websiteanddomainwill benull, and the record cannot be used for downstream website-based enrichment. - No JavaScript rendering. The actor uses CheerioCrawler (HTTP-based), not a browser. Pages that require client-side JavaScript to render their content will return incomplete data. Both SourceForge and TrustRadius use server-rendered HTML so this does not currently affect results, but any future sources added to this actor that require browser execution would need a separate implementation.
- TrustRadius sitemap coverage is 5 shards out of approximately 25. The actor fetches sitemaps 1–5, covering thousands of product URLs. Products in higher-numbered shards are not discovered in a standard run. For complete TrustRadius coverage across all shards, contact us about a custom configuration.
- Deduplication is run-scoped. The
seenDomainsset is created fresh each run. If you run the actor twice against the same categories, the same companies can appear in both datasets. Use thedomainfield as a unique key in your downstream storage to handle cross-run deduplication. - No employee count, funding, or HQ data. Neither SourceForge nor TrustRadius consistently exposes firmographic data in their listing HTML. Use Company Deep Research or a downstream enrichment actor to add firmographic context.
- Rating scale conversion is a linear approximation. TrustRadius 10-point scores are divided by 2. This does not account for distribution differences between the two rating systems; a TrustRadius 8.6 becomes 4.3, but the populations rated by each platform differ.
Integrations
- Zapier — trigger a Zap when a run completes to push new software companies into a Google Sheet or CRM automatically
- Make — build a multi-step scenario that scrapes companies, enriches contacts, and adds leads to your outbound sequence tool
- Google Sheets — export the dataset directly to a sheet for manual review and prioritization before enrichment
- Apify API — trigger runs programmatically from your sales or marketing automation platform and receive results via webhook
- Webhooks — post the completed dataset URL to a Slack channel or internal dashboard when a run finishes
- LangChain / LlamaIndex — use scraped software company descriptions and category data as a knowledge base for AI-powered market research agents
Troubleshooting
Zero results despite providing a valid category. The most common cause is a category slug that does not match the SourceForge URL structure. Verify by visiting https://sourceforge.net/software/your-slug/ directly. If the page shows no products, try a more general slug (e.g. crm instead of crm-software). For TrustRadius, results depend on sitemap coverage — if the category has few matching products in the first five sitemaps, output will be low.
All results have null website and domain fields. Some SourceForge categories list products without a vendor website link in the listing card. This is more common in open-source or niche categories. The profileUrl still links to the directory listing and can be used as a secondary identifier for manual lookup.
TrustRadius results are empty or very few. TrustRadius products are discovered via sitemap, not via category-filtered listings. Lowering minReviews to 0 and minRating to 0 confirms whether any records can be found. The actor enqueues targetCount * 2 product URLs to account for filtering, but the absolute maximum is bounded by what appears in the first five sitemap shards.
Run completes faster than expected with fewer results than the limit. This means the actor exhausted all available listing pages before reaching your maxCompaniesPerCategory limit. SourceForge categories vary in size — smaller niches may have fewer than 50 products total. Check the summary record's companiesByCategory field to see how many were found per category.
Duplicate companies appearing across multiple runs. Deduplication only operates within a single run. Across multiple runs, the same company can appear again. Use the domain field as a unique key in your downstream storage — a Google Sheets VLOOKUP, CRM deduplication rule, or a database unique constraint on domain will handle this cleanly.
Responsible use
- This actor only accesses publicly available software directory listings on SourceForge and TrustRadius.
- Respect each platform's terms of service and
robots.txtdirectives. - Comply with GDPR, CAN-SPAM, and other applicable data protection laws when using scraped company data for outreach.
- Do not use extracted data to send unsolicited bulk email or for spam campaigns.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many software companies can I scrape in one run?
There is no hard cap from the actor. The maxCompaniesPerCategory parameter (default 50, max 1000) controls per-category volume, and you can run as many categories as you like in a single run. Your practical limit is your Apify spending budget — at $0.05 per company, a $50 budget yields up to 1,000 companies.
Does Software Directory Scraper work for any software category?
It works for any category that has a valid slug on SourceForge (https://sourceforge.net/software/{slug}/). Common slugs include crm, project-management, email-marketing, accounting, help-desk, marketing-automation, hr-software, erp, business-intelligence, and video-conferencing. TrustRadius coverage depends on sitemap inclusion and is not category-filtered.
How accurate is the rating data from this scraper? Ratings are taken directly from the directory listings and reflect each platform's own aggregated scores. TrustRadius 10-point scores are converted to a 5-point scale by dividing by 2. The accuracy of the underlying ratings is determined by each directory's own review processes — the actor extracts them without modification beyond scale normalization.
What is the difference between SourceForge and TrustRadius results? SourceForge has 100k+ products including many open-source and SMB-focused tools, with paginated category listings, explicit pricing, and badge data. TrustRadius focuses on enterprise B2B software with in-depth review scoring. Using both sources together gives broader category coverage across company sizes and market segments.
How is this different from scraping G2, Capterra, or GetApp? G2, Capterra, and GetApp aggressively block HTTP scrapers with Cloudflare's JS challenge — extracting data from them requires a full browser with anti-detection measures, which is slower and more expensive. SourceForge and TrustRadius serve their listing pages to datacenter IPs without blocking, making this actor fast, reliable, and proxy-free.
Can I scrape software company leads from multiple categories at once?
Yes. Pass multiple slugs in the categories array: ["crm", "project-management", "email-marketing"]. The actor processes all categories in parallel using a shared crawler queue. Deduplication operates across the entire run, so a company appearing in two categories is only returned once when deduplicateByDomain: true.
How long does a typical software directory scraping run take?
A single category at maxCompaniesPerCategory: 50 from both sources typically completes in 3–6 minutes. Five categories at the same limit take 10–20 minutes. TrustRadius runs slightly longer because each product requires an individual page fetch after sitemap parsing.
Can I filter out free and open-source software from the results?
There is no dedicated filter for this, but you can filter the output dataset by the pricingTier field. Records where pricingTier is "Free" or "Open Source" can be excluded in post-processing in Excel, Google Sheets, or your pipeline code.
Is it legal to scrape SourceForge and TrustRadius? Scraping publicly available data from software directories is generally considered lawful in most jurisdictions. Both SourceForge and TrustRadius publish their listings publicly without authentication requirements. Always respect the platforms' terms of service and use the data responsibly. See Apify's web scraping legality guide for a detailed overview.
Can I schedule this actor to run automatically every week?
Yes. Use Apify's built-in scheduler to run on any cron schedule — daily, weekly, or monthly. Weekly runs against fast-moving categories like ai-tools or marketing-automation keep your lead list current as new products are added to the directories.
What happens if the same company appears on both SourceForge and TrustRadius?
With deduplicateByDomain: true (the default), the first occurrence is kept and the duplicate is skipped. The source field on the kept record shows which directory found it first. With deduplicateByDomain: false, both records are returned so you can compare ratings and review counts across sources.
Can I connect the output directly to HubSpot or Salesforce?
Yes. Use HubSpot Lead Pusher to push company records into HubSpot, or use Apify's Zapier or Make integrations to route data to Salesforce, Pipedrive, or any other CRM. The domain field is a reliable unique key for CRM deduplication.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
Related actors
AI Cold Email Writer — $0.01/Email, Zero LLM Markup
Generates personalized cold emails from enriched lead data using your own OpenAI or Anthropic key. Subject line, body, CTA, and optional follow-up sequence — $0.01/email, zero LLM markup.
AI Outreach Personalizer — Emails with Your LLM Key
Generate personalized cold emails using your own OpenAI or Anthropic API key. Subject lines, opening lines, full bodies — tailored to each lead's role, company, and signals. $0.01/lead compute + your LLM costs. Zero AI markup.
B2B Lead Generation Suite - Find Emails, Score & Qualify Leads
All-in-one B2B lead pipeline. Enter company URLs, get enriched leads with emails, phone numbers, contacts, email patterns, quality scores (0-100), grades, and business signals from a 3-step automated pipeline.
B2B Lead Qualifier - Score & Rank Company Leads
B2B lead scoring tool and API that scores companies 0-100 from 30+ website signals. 5 scoring categories, 4 profiles (sales, marketing, recruiting, default). Plain-English explanations, hiring detection, industry classification, score change tracking. $0.15/lead, no subscription.
Ready to try SourceForge & TrustRadius — Software Vendor Leads?
Start for free on Apify. No credit card required.
Open on Apify Store