Startup Due Diligence
Startup due diligence that would take a junior analyst two days runs in under three minutes. This actor queries 8 intelligence sources simultaneously — corporate registrations, USPTO patents, EPO patents, GitHub repositories, tech stack, job listings, ArXiv research papers, and SaaS competitive data — then synthesizes everything into a VC-grade deal memo with a composite score, deal rating, investment thesis points, and red flags.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| analysis-run | Full intelligence analysis run | $0.40 |
Example: 100 events = $40.00 · 1,000 events = $400.00
Documentation
Startup due diligence that would take a junior analyst two days runs in under three minutes. This actor queries 8 intelligence sources simultaneously — corporate registrations, USPTO patents, EPO patents, GitHub repositories, tech stack, job listings, ArXiv research papers, and SaaS competitive data — then synthesizes everything into a VC-grade deal memo with a composite score, deal rating, investment thesis points, and red flags.
Built for investors, accelerator managers, and corporate development teams who need structured, repeatable startup analysis without spreadsheets or manual research. One company name in, one decision-ready report out.
What data does startup due diligence produce?
| Data Point | Source | Example |
|---|---|---|
| 📊 Composite score | All 8 sources | 74 / 100 |
| 🏷️ Deal rating | Composite model | DILIGENCE |
| ⚡ Innovation velocity score | USPTO + EPO + GitHub + ArXiv | 81 / 100 — FAST |
| 🛡️ Competitive moat score | Patents + tech stack + competitor count | 68 / 100 — STRONG |
| 👥 Hiring signal score | Job listings | 55 / 100 — SCALING |
| 🏢 Corporate health score | OpenCorporates | 90 / 100 — STRONG |
| 🔬 Strategy inference | Hiring pattern analysis | BUILDING |
| 📋 Investment thesis points | Scoring model outputs | ["High innovation velocity (81/100)..."] |
| 🚩 Red flags | Scoring model outputs | ["2 dissolved entities — investigate"] |
| 🔗 Patent count (USPTO) | US Patent search | 9 |
| 🌍 Patent count (EPO) | European Patent Office | 3 |
| 💻 GitHub repos + stars | GitHub search | 22 repos, 3,800 stars |
| 📰 ArXiv paper count | ArXiv research search | 4 |
| 🏭 Tech stack depth | Website tech detector | 11 components |
| 📄 Corporate registrations | OpenCorporates | 2 active entities (us_de, gb) |
| 🏢 Open job listings | Job market intelligence | 17 listings |
| 🤝 Competitor count | SaaS competitive intel | 6 identified |
| 🕐 Generated at | Run metadata | 2026-03-20T09:14:33.000Z |
Why use startup due diligence screening?
Manual startup due diligence means opening eight browser tabs, searching each database individually, copying numbers into a spreadsheet, and hoping you haven't missed something. For a single company that process takes two to four hours. Scale it to a deal pipeline of twenty companies and you have a week of analyst time before a single investment decision gets made.
This actor automates the entire data collection and scoring process. Enter a company name, get a structured report with a 0-100 composite score and one of four deal ratings — STRONG_BUY, DILIGENCE, WATCH, or PASS — in under three minutes. Every run is identical and auditable, so your whole team can compare notes on the same structured output rather than arguing over whose spreadsheet is current.
- Scheduling — run the same company monthly to track hiring velocity, patent filings, and competitive positioning over time
- API access — trigger due diligence runs from your deal flow CRM, Notion database, or internal tooling via Python, JavaScript, or HTTP
- Proxy rotation — all sub-actors use Apify's built-in infrastructure, no IP management required
- Monitoring — set up Slack or email alerts when a run fails or a deal rating changes on a monitored company
- Integrations — push deal memos directly to Google Sheets, HubSpot, Zapier, or Make for pipeline tracking
Features
- 8 data sources queried in parallel — OpenCorporates, USPTO, EPO, GitHub, website tech stack, job market, ArXiv, and SaaS competitive intel run simultaneously via
Promise.allSettled(), keeping total runtime under three minutes even if one sub-source fails - Resilient sub-actor orchestration — if any single data source is unavailable, the actor logs a warning and continues scoring with the remaining sources rather than failing the entire run
- Innovation Velocity model (30% weight) — USPTO patent portfolio scores up to 25 points; GitHub activity combines repo count (max 15) and star count on a logarithmic scale (max 15); ArXiv publications score up to 25 points; recency bonus adds up to 20 points for activity in the last 6 months
- Competitive Moat model (25% weight) — tech stack depth scores up to 25 points; patent protection scores up to 30 points; competitor count is inverted (fewer competitors = higher score, max 25); GitHub star community adds up to 20 points via log-scale network effect calculation
- Hiring Signal Decoder (25% weight) — classifies every job title into engineering, sales, marketing, executive, operations, or other using 30+ keyword patterns; infers company strategy as BUILDING, SCALING, PIVOTING, or MAINTAINING from role distribution ratios
- Corporate Health Check (20% weight) — validates entity existence via OpenCorporates; penalizes dissolved or inactive entities; scores multi-jurisdiction structure from 1-3 jurisdictions (clean) to 6+ jurisdictions (complex/concerning)
- Composite scoring formula — Innovation 30% + Moat 25% + Hiring 25% + Corporate 20% — produces a 0-100 score with four deal ratings: STRONG_BUY (75+), DILIGENCE (50-74), WATCH (25-49), PASS (0-24)
- Automatic investment thesis generation — bullish signals (HYPERGROWTH velocity, FORTRESS moat, BUILDING hiring) are extracted and written as plain-English thesis statements
- Automatic red flag extraction — POOR corporate health, DORMANT innovation, NONE/WEAK moat, and 3+ dissolved entities generate named red flags for immediate review
- Sector-aware search — providing a sector refines GitHub and ArXiv queries to reduce false positives for common company names
- Domain-based tech stack detection — provide the company's website URL to get precise tech stack depth analysis instead of a generic name search
- Full signal log — every scored data point that crosses a threshold is recorded in
allSignalsfor a complete audit trail
Use cases for startup due diligence
Venture capital deal screening
VC analysts reviewing 50+ inbound deals per month have no time for manual research at the top of the funnel. Run this actor on every inbound company. Use the composite score to sort your pipeline: STRONG_BUY and DILIGENCE deals get partner meetings, WATCH deals go into a tracking list, PASS deals get a polite decline. The hiring signal decoder tells you whether a company is still building product (BUILDING) or already at go-to-market (SCALING), which directly informs whether the deal fits your stage thesis.
Angel investor pre-meeting research
Before a 30-minute founder call, run a due diligence report. Arrive knowing the patent count, GitHub star trajectory, number of open engineering roles, and whether the corporate structure looks clean. The investment thesis points give you conversation starters; the red flags give you pointed questions. Research that used to take two hours happens in the time it takes to get coffee.
Accelerator and incubator cohort selection
Program managers evaluating 200+ applications need a repeatable scoring method that doesn't require reading every deck. Batch-run due diligence reports via the API on every applicant company. Sort by composite score, filter by sector, and focus detailed human review on the top 20% of applicants. The innovation velocity score is particularly useful for identifying research-backed founders who may not write polished decks.
Corporate M&A and strategic partnership screening
Corporate development teams assessing acquisition targets or partnership candidates need structured, comparable data. The competitive moat analysis identifies whether a target company's advantage comes from patents, technical complexity, community network effects, or market positioning — which directly informs deal structuring and valuation discussions.
PE growth equity pre-LOI diligence
Private equity teams evaluating growth-stage companies can use the hiring signal decoder to validate management's claimed growth trajectory. A company claiming aggressive expansion should show SCALING-phase hiring (sales and marketing dominant). A company claiming product-led growth should show BUILDING-phase hiring (engineering dominant). Mismatches between narrative and hiring data are a meaningful diligence signal.
Portfolio monitoring and competitive intelligence
Run monthly reports on your existing portfolio companies and their named competitors. Track how composite scores change over time — a company's innovation velocity dropping from FAST to SLOW, or a competitor filing 10 new patents, is an early warning signal worth acting on.
How to run startup due diligence on a company
- Enter the company name — type the startup's name exactly as it appears in corporate registrations (e.g., "Figma", "Notion", "Rippling"). This is the only required field.
- Configure options — optionally add the company's website URL (e.g.,
https://figma.com) for precise tech stack analysis, and a sector keyword (e.g., "design software", "HR tech") to reduce false positives in research and patent searches. - Run the actor — click "Start" and wait approximately 2-3 minutes while all 8 data sources are queried in parallel.
- Download results — your deal memo appears in the Dataset tab. Export as JSON for API workflows, CSV for spreadsheet analysis, or Excel for investor reporting.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
companyName | string | Yes | — | The startup or company name to analyze (e.g., "Stripe", "Figma", "Notion") |
domain | string | No | — | Company website URL for tech stack detection (e.g., "https://stripe.com"). Omit for name-based detection. |
sector | string | No | — | Sector or industry to refine GitHub and ArXiv search results (e.g., "fintech", "AI", "healthcare") |
Input examples
Standard due diligence run — company name only:
{
"companyName": "Notion"
}
Full diligence with domain and sector:
{
"companyName": "Rippling",
"domain": "https://rippling.com",
"sector": "HR tech"
}
Minimal fastest run — name only, no optional fields:
{
"companyName": "Figma"
}
Input tips
- Always use the legal entity name — for corporate registration matching, use the registered company name rather than a product name or brand nickname. For example, "Figma Inc" may return better corporate health results than "Figma".
- Add the domain for accurate tech stack analysis — without a domain URL, the actor attempts tech detection by company name query, which may return imprecise results. For any company with a public website, provide the URL.
- Use sector to disambiguate common names — companies named "Atlas", "Beacon", or "Mercury" exist in many industries. Adding "fintech" or "logistics" narrows GitHub and ArXiv searches to relevant results.
- Batch via API for pipeline screening — processing 20 deal flow companies as 20 sequential API calls is more efficient than running them one by one in the UI.
- Treat WATCH-rated companies as async monitoring candidates — schedule monthly re-runs on WATCH-rated companies to catch when a hiring surge or patent filing pushes them into DILIGENCE territory.
Output example
{
"query": "Rippling",
"compositeScore": 74,
"dealRating": "DILIGENCE",
"allSignals": [
"12 patents filed (9 USPTO, 3 EPO)",
"3800 GitHub stars across 22 repos — strong developer community",
"22 public repositories — active open source presence",
"4 ArXiv publications — research-driven innovation",
"58% engineering hires — product building phase",
"17 open positions — significant growth",
"11 technology components detected — complex tech stack",
"12 patents providing IP protection",
"Only 5 direct competitors identified — limited competition",
"3800 GitHub stars — developer community network effect",
"2 corporate registration(s) found"
],
"recommendations": [
"High innovation velocity (81/100) — strong R&D output",
"STRONG competitive moat: Technical complexity, Patent portfolio, Market positioning",
"Engineering-heavy hiring — product building phase (ideal for early-stage)"
],
"dataSources": {
"corporateRegistrations": 2,
"usptoPatents": 9,
"epoPatents": 3,
"githubRepos": 22,
"techStackComponents": 11,
"jobListings": 17,
"arxivPapers": 4,
"saasCompetitors": 5
},
"generatedAt": "2026-03-20T09:14:33.821Z",
"inputDomain": "https://rippling.com",
"inputSector": "HR tech",
"innovationVelocity": {
"score": 81,
"patentCount": 9,
"epoPatentCount": 3,
"githubRepos": 22,
"githubStars": 3800,
"arxivPapers": 4,
"velocityLevel": "FAST",
"signals": [
"12 patents filed (9 USPTO, 3 EPO)",
"3800 GitHub stars across 22 repos — strong developer community",
"22 public repositories — active open source presence",
"4 ArXiv publications — research-driven innovation"
]
},
"hiringSignals": {
"score": 55,
"totalJobs": 17,
"engineeringJobs": 10,
"salesJobs": 3,
"executiveJobs": 1,
"strategyInference": "BUILDING",
"roleDistribution": {
"engineering": 10,
"sales": 3,
"marketing": 2,
"executive": 1,
"operations": 1,
"other": 0
},
"signals": [
"58% engineering hires — product building phase",
"17 open positions — significant growth"
]
},
"competitiveMoat": {
"score": 68,
"techStackDepth": 11,
"competitorCount": 5,
"patentProtection": 12,
"moatType": "STRONG",
"moatFactors": [
"Technical complexity",
"Patent portfolio",
"Market positioning",
"Community/network effects"
],
"signals": [
"11 technology components detected — complex tech stack",
"12 patents providing IP protection",
"Only 5 direct competitors identified — limited competition",
"3800 GitHub stars — developer community network effect"
]
},
"corporateHealth": {
"score": 90,
"entityCount": 2,
"activeEntities": 2,
"inactiveEntities": 0,
"jurisdictions": ["us_de", "gb"],
"healthLevel": "STRONG",
"signals": [
"2 corporate registration(s) found"
]
},
"investmentThesis": [
"High innovation velocity (81/100) — strong R&D output",
"STRONG competitive moat: Technical complexity, Patent portfolio, Market positioning",
"Engineering-heavy hiring — product building phase (ideal for early-stage)"
],
"redFlags": []
}
Output fields
| Field | Type | Description |
|---|---|---|
query | string | The company name used for the report |
compositeScore | number | Weighted composite score from 0-100 |
dealRating | string | STRONG_BUY, DILIGENCE, WATCH, or PASS |
allSignals | string[] | All scored signals that crossed detection thresholds |
recommendations | string[] | Combined investment thesis points and red flags |
dataSources | object | Record count from each of the 8 data sources |
dataSources.corporateRegistrations | number | OpenCorporates entity count |
dataSources.usptoPatents | number | US patent filings found |
dataSources.epoPatents | number | European patent filings found |
dataSources.githubRepos | number | GitHub repositories found |
dataSources.techStackComponents | number | Website technology components detected |
dataSources.jobListings | number | Open job postings found |
dataSources.arxivPapers | number | ArXiv research papers found |
dataSources.saasCompetitors | number | SaaS competitors identified |
generatedAt | string | ISO 8601 timestamp of report generation |
inputDomain | string | Domain URL provided (null if omitted) |
inputSector | string | Sector provided (null if omitted) |
innovationVelocity.score | number | Innovation Velocity sub-score (0-100) |
innovationVelocity.patentCount | number | USPTO patents found |
innovationVelocity.epoPatentCount | number | EPO patents found |
innovationVelocity.githubRepos | number | GitHub repos found |
innovationVelocity.githubStars | number | Total GitHub stars across all repos |
innovationVelocity.arxivPapers | number | ArXiv papers found |
innovationVelocity.velocityLevel | string | HYPERGROWTH, FAST, MODERATE, SLOW, or DORMANT |
innovationVelocity.signals | string[] | Threshold-crossing innovation signals |
hiringSignals.score | number | Hiring Signal sub-score (0-100) |
hiringSignals.totalJobs | number | Total open positions found |
hiringSignals.engineeringJobs | number | Engineering/technical roles |
hiringSignals.salesJobs | number | Sales/revenue roles |
hiringSignals.executiveJobs | number | Executive/leadership roles |
hiringSignals.strategyInference | string | BUILDING, SCALING, PIVOTING, or MAINTAINING |
hiringSignals.roleDistribution | object | Count breakdown by role category |
hiringSignals.signals | string[] | Hiring pattern signals |
competitiveMoat.score | number | Competitive Moat sub-score (0-100) |
competitiveMoat.techStackDepth | number | Number of technology components detected |
competitiveMoat.competitorCount | number | Direct SaaS competitors identified |
competitiveMoat.patentProtection | number | Total USPTO + EPO patents |
competitiveMoat.moatType | string | FORTRESS, STRONG, MODERATE, WEAK, or NONE |
competitiveMoat.moatFactors | string[] | Named moat factors contributing to score |
competitiveMoat.signals | string[] | Moat-specific scored signals |
corporateHealth.score | number | Corporate Health sub-score (0-100) |
corporateHealth.entityCount | number | Total corporate entities found |
corporateHealth.activeEntities | number | Active/good-standing entities |
corporateHealth.inactiveEntities | number | Dissolved or inactive entities |
corporateHealth.jurisdictions | string[] | Jurisdiction codes (e.g., us_de, gb, ie) |
corporateHealth.healthLevel | string | STRONG, GOOD, ACCEPTABLE, CONCERNING, or POOR |
corporateHealth.signals | string[] | Corporate structure signals |
investmentThesis | string[] | Bullish thesis points auto-generated from scoring |
redFlags | string[] | Bearish warning points auto-generated from scoring |
How much does it cost to run startup due diligence?
Startup Due Diligence uses pay-per-run pricing — each run calls 8 sub-actors in parallel and costs approximately $0.25-$0.60 in platform credits depending on how much data each source returns. Compute costs are included.
| Scenario | Companies | Approx. cost per run | Total cost |
|---|---|---|---|
| Quick test | 1 | $0.30 | $0.30 |
| Small batch | 5 | $0.35 | $1.75 |
| Deal flow screen | 20 | $0.40 | $8.00 |
| Portfolio sweep | 50 | $0.40 | $20.00 |
| Enterprise pipeline | 200 | $0.45 | $90.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.
Manual analyst time for equivalent research runs $75-$150/hour. A typical 8-source manual diligence takes 2-4 hours per company — $150-$600 per company. This actor produces comparable structured output for under $0.50. Most teams spend $20-$50/month covering their full active pipeline.
Startup due diligence using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/startup-due-diligence").call(run_input={
"companyName": "Rippling",
"domain": "https://rippling.com",
"sector": "HR tech"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Company: {item['query']}")
print(f"Score: {item['compositeScore']}/100 — Rating: {item['dealRating']}")
print(f"Innovation: {item['innovationVelocity']['velocityLevel']} ({item['innovationVelocity']['score']}/100)")
print(f"Moat: {item['competitiveMoat']['moatType']} ({item['competitiveMoat']['score']}/100)")
print(f"Strategy: {item['hiringSignals']['strategyInference']}")
print(f"Thesis: {item['investmentThesis']}")
print(f"Red Flags: {item['redFlags']}")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/startup-due-diligence").call({
companyName: "Rippling",
domain: "https://rippling.com",
sector: "HR tech"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`${item.query}: ${item.compositeScore}/100 — ${item.dealRating}`);
console.log(`Innovation: ${item.innovationVelocity.velocityLevel}`);
console.log(`Moat: ${item.competitiveMoat.moatType}`);
console.log(`Strategy: ${item.hiringSignals.strategyInference}`);
console.log(`Thesis:`, item.investmentThesis);
console.log(`Red Flags:`, item.redFlags);
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~startup-due-diligence/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"companyName": "Rippling",
"domain": "https://rippling.com",
"sector": "HR tech"
}'
# Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Startup Due Diligence works
Phase 1 — Parallel data collection
The actor calls 8 sub-actors simultaneously using Promise.allSettled(), which means all 8 sources are queried at the same time rather than sequentially. Each sub-actor is allocated 256MB of memory and a 120-second timeout. The runActorsParallel() function handles partial failure gracefully: if any sub-actor returns an error, the result for that source is set to an empty array, and scoring continues with the remaining data. Total collection time is bounded by the slowest single source, not the sum of all sources.
Phase 2 — Four independent scoring models
Each scoring model operates on the raw data arrays returned by its relevant sub-actors:
Innovation Velocity reads from USPTO patents (max 25 pts), EPO patents (included in USPTO total), GitHub repos (max 15 pts for count) and GitHub stars on a log2 scale (max 15 pts), and ArXiv papers (max 25 pts). A recency bonus of up to 20 points is added for GitHub repos and ArXiv papers updated within the last 6 months. Score thresholds: HYPERGROWTH (80+), FAST (60-79), MODERATE (40-59), SLOW (20-39), DORMANT (0-19).
Hiring Signal Decoder classifies job titles using 30+ keyword patterns spread across 6 categories: engineering (engineer, developer, devops, ml, sre, platform, etc.), sales (account executive, bdr, sdr, revenue, etc.), marketing (growth, demand gen, product marketing, etc.), executive (cto, vp, director, head of, chief, etc.), operations, and other. Role distribution ratios determine strategy inference: engineering ratio ≥ 50% = BUILDING, sales ratio ≥ 40% = SCALING, executive ratio ≥ 30% = PIVOTING.
Competitive Moat uses an inverted competitor count score — fewer identified SaaS competitors increases the score (max 25 pts), while 8+ competitors apply a full penalty. Tech stack complexity scores up to 25 pts. Patent protection scores up to 30 pts. GitHub stars use a log2(stars) * 3 network effect formula capped at 20 pts.
Corporate Health checks OpenCorporates entity status strings for keywords: "active", "good standing", "live" count as active; "dissolv", "inactive", "struck", "revoked" count as inactive. Clean structures of 1-3 entities in 1-3 jurisdictions score highest. Complex structures with 5+ entities or 6+ jurisdictions are penalized.
Phase 3 — Composite scoring and deal memo generation
The composite score applies weighted averages: Innovation × 0.30 + Moat × 0.25 + Hiring × 0.25 + Corporate × 0.20. Threshold-crossing signals from each model are merged into a single allSignals array. Bullish conditions (HYPERGROWTH velocity, FORTRESS/STRONG moat, BUILDING/SCALING hiring) generate investment thesis statements. Bearish conditions (POOR/CONCERNING corporate health, DORMANT innovation, NONE/WEAK moat, 3+ dissolved entities) generate named red flags. The final report is pushed to the Apify dataset as a single structured JSON record.
Tips for best results
-
Provide the domain URL whenever possible. Tech stack analysis without a domain falls back to a name-based query which can match unrelated companies. The domain URL gives the Website Tech Stack Detector an exact target, producing accurate component counts that feed into the competitive moat score.
-
Use sector to reduce false positives. The company name "Mercury" appears in fintech, healthcare, aerospace, and retail. Adding
"sector": "fintech"narrows GitHub and ArXiv queries to relevant technical output, producing a more accurate innovation velocity score. -
Schedule monthly monitoring runs on WATCH-rated companies. A company rated WATCH at Series A may hit DILIGENCE or STRONG_BUY six months later after a product launch or fundraise. Scheduling a monthly run costs under $5/month per company and surfaces these transitions automatically.
-
Cross-reference hiring strategy with the company's stated narrative. If a founder claims "we're scaling revenue aggressively" but the hiring decoder shows BUILDING (70%+ engineering), that's a meaningful discrepancy worth exploring in diligence calls.
-
Run competitor names through the same actor. Use the
saasCompetitorsfield from one run to feed the next — running due diligence on named competitors gives you a comparable scoring matrix for relative positioning analysis. -
Export to Google Sheets for cohort comparisons. The structured JSON output maps cleanly to spreadsheet columns. A 20-company accelerator cohort produces a sortable, comparable scoring table in minutes.
-
Combine with Company Deep Research for narrative depth. This actor produces quantitative scores; Company Deep Research adds qualitative narrative intelligence. Run both for investment committee materials.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Company Deep Research | Use this actor for quantitative scoring, then run Company Deep Research on DILIGENCE and STRONG_BUY rated companies for narrative intelligence reports |
| Website Tech Stack Detector | Run independently on the company domain for a more detailed technology breakdown than the summary count included in the composite score |
| B2B Lead Qualifier | Score the startup's own customers as leads — useful for portfolio companies building outbound pipelines |
| Trustpilot Review Analyzer | Pull customer sentiment data on the target company and its competitors as a qualitative complement to the quantitative moat score |
| Job Market Intelligence | Pull a deeper breakdown of the hiring trends beyond the high-level role categories scored here |
| WHOIS Domain Lookup | Verify domain registration age and ownership details as an additional corporate legitimacy signal |
| Website Contact Scraper | Extract leadership team contact details from the company's website after identifying a STRONG_BUY target |
Limitations
- Data availability varies by company maturity. Early pre-launch startups with no GitHub repos, no job postings, and no patents will score low across all models. A low score for a very early company may reflect data absence rather than company quality. Use the
dataSourcescounts to assess coverage. - Corporate registration coverage is strongest for US and UK entities. OpenCorporates has broad global coverage, but some jurisdictions (private Asian markets, certain European countries) may return no results even for legitimate companies. A zero corporate health entity count does not always indicate a shell company.
- Patent searches match by keyword, not by exact assignee. Companies with common names may return patents from unrelated organizations. The sector input parameter narrows searches but does not guarantee perfect patent attribution. Verify high patent counts manually for ambiguous company names.
- GitHub repo matching is keyword-based. The actor finds repos where the company name appears in the query — it does not access a verified mapping of GitHub organizations to legal entities. Repos from similarly named projects may inflate star counts.
- Hiring data depends on job board coverage. The job market intelligence source aggregates from public job boards, but some companies post exclusively on their own career pages or internal systems. A low hiring signal score may mean light job board usage, not low hiring.
- Competitor identification is bounded by SaaS directory coverage. The competitive moat score uses the count of identified competitors, which depends on competitor data availability. Niche B2B markets may show fewer competitors than actually exist, artificially inflating the moat score.
- All data is point-in-time. The report reflects data available at the moment of the run. Patent filings, job postings, and GitHub activity change daily. Schedule recurring runs for monitoring rather than relying on a single historical report.
- No financial data. This actor does not access revenue, EBITDA, burn rate, cap table, or investor data. It is not a substitute for financial due diligence. Combine with SEC EDGAR data (for public companies) or direct data room review for financial analysis.
Integrations
- Zapier — trigger a due diligence run automatically when a new company is added to your deal flow CRM or Airtable, then post the deal rating to a Slack channel
- Make — build multi-step automations that run diligence, filter by deal rating, and route STRONG_BUY results to partner review workflows
- Google Sheets — export deal memos to a shared spreadsheet for cohort comparison; composite scores map cleanly to sortable columns
- Apify API — integrate directly into deal flow platforms, Notion databases, or internal tools using the Python or JavaScript client
- Webhooks — receive a POST notification with the full report JSON when a run completes, enabling real-time pipeline updates
- LangChain / LlamaIndex — feed deal memos into an LLM pipeline for natural language Q&A over your startup portfolio data
Troubleshooting
-
Low composite score despite a well-known company — Common company names (Mercury, Atlas, Beacon) may match unrelated patents, GitHub repos, or corporate entities. Add the
sectorparameter to focus searches. Also check thedataSourcescounts: ifusptoPatentsis 0 andgithubReposis 0, the company name may not be matching correctly in those systems. -
Corporate health score is 0 or very low — OpenCorporates may not have data for companies incorporated in certain jurisdictions or very recently formed entities. A low corporate health score with 0 entity count does not necessarily indicate fraud — verify manually via the relevant national business registry.
-
Run takes longer than 3 minutes — One or more sub-actors is hitting its 120-second timeout. This typically happens when job market intelligence or SaaS competitive intel searches return large result sets. The actor will still complete with partial data. Check the
dataSourcescounts in the output to see which sources returned zero results. -
investmentThesisis empty despite a high composite score — Thesis points are only generated when specific thresholds are crossed: HYPERGROWTH or FAST velocity, FORTRESS or STRONG moat, BUILDING or SCALING strategy inference. A high composite score assembled from many moderate sub-scores may not trigger any individual thesis statement. Check each sub-score directly. -
Red flags appear for a company you know to be legitimate — Dissolved entities in the corporate health check sometimes represent normal restructuring events (e.g., renaming the entity, jurisdiction change). Use the
corporateHealth.jurisdictionsandinactiveEntitiesfields to investigate specific registrations via OpenCorporates directly.
Responsible use
- This actor only accesses publicly available data from open registries, patent databases, GitHub, job boards, and research archives.
- Respect the terms of service of all underlying data sources.
- Due diligence data should be used for legitimate investment research and business evaluation purposes only.
- Do not use company intelligence data to harass founders, manipulate markets, or engage in unauthorized corporate surveillance.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How many companies can I screen with startup due diligence in one month?
On Apify's free tier ($5/month in credits), you can run approximately 10-15 due diligence reports depending on how much data each company returns. On a $49/month plan, you can screen 80-150 companies. Enterprise plans support unlimited pipeline volume.
How accurate is the composite score for startup due diligence?
The score reflects the volume and recency of publicly observable signals — patents, GitHub activity, hiring, and corporate registrations. It is accurate for signal detection but not a substitute for financial diligence. A company with DORMANT innovation velocity may still be a strong business if it operates in a non-technical sector where patent and GitHub signals are structurally absent (e.g., services businesses).
Does startup due diligence work for non-US companies?
Yes. EPO patent data covers European companies, OpenCorporates covers 140+ jurisdictions, and GitHub and ArXiv are global. However, corporate registration coverage varies by country, and job board coverage skews toward English-language postings. Results may be less complete for companies operating primarily in non-English-speaking markets.
How is startup due diligence different from hiring a research analyst?
An analyst produces richer qualitative insight but costs $75-$150/hour and takes 2-4 hours per company. This actor produces structured, quantitative, comparable data in under 3 minutes for under $0.50. It is best used to pre-screen a large pipeline and prioritize which companies receive full analyst attention, not to replace analysts entirely.
What does the BUILDING vs SCALING strategy inference mean?
BUILDING means 50%+ of open roles are engineering or technical positions — the company is investing primarily in product development. SCALING means 40%+ of roles are in sales, business development, or revenue — the company has product-market fit and is investing in go-to-market. PIVOTING means 30%+ of roles are executive or leadership hires, which often signals a leadership restructuring or strategic pivot. MAINTAINING means no dominant hiring pattern.
Can I run startup due diligence on a company that has no website?
Yes. The domain field is optional. Without it, the actor will attempt tech stack detection via name-based query, though results may be less precise. All other 7 data sources (patents, GitHub, jobs, ArXiv, corporate registrations, competitors) do not require a domain URL.
Is it legal to scrape startup data for due diligence purposes?
Yes. This actor exclusively accesses publicly available data from open registries (OpenCorporates, USPTO, EPO), public platforms (GitHub, ArXiv), and public job boards. No private, authenticated, or paywalled data is accessed. Investment research using public data is standard practice in the industry. For guidance, see Apify's web scraping legality guide.
How does the competitive moat score handle a startup with no GitHub presence?
GitHub stars contribute a maximum of 20 points to the moat score via a log2(stars) * 3 formula. A company with zero GitHub repos scores 0 on that dimension but can still achieve MODERATE or STRONG moat ratings through patent protection (max 30 pts) and market positioning (max 25 pts). The moat model does not require GitHub activity.
What triggers a STRONG_BUY deal rating?
A composite score of 75 or above. This requires strong performance across multiple dimensions — for example, FAST innovation velocity (score ~75), STRONG competitive moat (score ~70), active hiring in a clear strategic direction, and clean corporate structure. In practice, STRONG_BUY ratings go to well-established companies with strong public IP and developer communities, not typical seed-stage startups.
Can I schedule this actor to monitor a startup over time?
Yes. Use the Apify scheduler to run the actor on the same company monthly or quarterly. Each run produces a new dataset record with a generatedAt timestamp, so you can track how composite scores, hiring strategies, and patent counts evolve over time. Schedule alerts on deal rating changes via Apify webhooks.
How does startup due diligence compare to PitchBook or Crunchbase?
PitchBook and Crunchbase provide funding history, investor networks, and revenue estimates — data this actor does not have. This actor provides patent analysis, GitHub activity, real-time hiring signals, tech stack complexity, and corporate structure health — data those platforms largely do not provide. They are complementary tools: use PitchBook for funding context, use this actor for operational and innovation signal analysis.
What happens if one of the 8 data sources fails during a run?
The actor uses Promise.allSettled() internally, which means a failure in one sub-actor does not abort the run. The failing source returns an empty array, the actor logs a warning, and scoring continues with the remaining 7 sources. The dataSources counts in the output will show 0 for any source that failed, making it easy to identify coverage gaps.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom scoring models, sector-specific configurations, or enterprise integrations with deal flow platforms, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.
Ready to try Startup Due Diligence?
Start for free on Apify. No credit card required.
Open on Apify Store