AIDEVELOPER TOOLS

Startup Due Diligence

Startup due diligence that would take a junior analyst two days runs in under three minutes. This actor queries 8 intelligence sources simultaneously — corporate registrations, USPTO patents, EPO patents, GitHub repositories, tech stack, job listings, ArXiv research papers, and SaaS competitive data — then synthesizes everything into a VC-grade deal memo with a composite score, deal rating, investment thesis points, and red flags.

Try on Apify Store
$0.40per event
1
Users (30d)
0
Runs (30d)
90
Actively maintained
Maintenance Pulse
$0.40
Per event

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

analysis-runs
Estimated cost:$40.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
analysis-runFull intelligence analysis run$0.40

Example: 100 events = $40.00 · 1,000 events = $400.00

Documentation

Startup due diligence that would take a junior analyst two days runs in under three minutes. This actor queries 8 intelligence sources simultaneously — corporate registrations, USPTO patents, EPO patents, GitHub repositories, tech stack, job listings, ArXiv research papers, and SaaS competitive data — then synthesizes everything into a VC-grade deal memo with a composite score, deal rating, investment thesis points, and red flags.

Built for investors, accelerator managers, and corporate development teams who need structured, repeatable startup analysis without spreadsheets or manual research. One company name in, one decision-ready report out.

What data does startup due diligence produce?

Data PointSourceExample
📊 Composite scoreAll 8 sources74 / 100
🏷️ Deal ratingComposite modelDILIGENCE
Innovation velocity scoreUSPTO + EPO + GitHub + ArXiv81 / 100 — FAST
🛡️ Competitive moat scorePatents + tech stack + competitor count68 / 100 — STRONG
👥 Hiring signal scoreJob listings55 / 100 — SCALING
🏢 Corporate health scoreOpenCorporates90 / 100 — STRONG
🔬 Strategy inferenceHiring pattern analysisBUILDING
📋 Investment thesis pointsScoring model outputs["High innovation velocity (81/100)..."]
🚩 Red flagsScoring model outputs["2 dissolved entities — investigate"]
🔗 Patent count (USPTO)US Patent search9
🌍 Patent count (EPO)European Patent Office3
💻 GitHub repos + starsGitHub search22 repos, 3,800 stars
📰 ArXiv paper countArXiv research search4
🏭 Tech stack depthWebsite tech detector11 components
📄 Corporate registrationsOpenCorporates2 active entities (us_de, gb)
🏢 Open job listingsJob market intelligence17 listings
🤝 Competitor countSaaS competitive intel6 identified
🕐 Generated atRun metadata2026-03-20T09:14:33.000Z

Why use startup due diligence screening?

Manual startup due diligence means opening eight browser tabs, searching each database individually, copying numbers into a spreadsheet, and hoping you haven't missed something. For a single company that process takes two to four hours. Scale it to a deal pipeline of twenty companies and you have a week of analyst time before a single investment decision gets made.

This actor automates the entire data collection and scoring process. Enter a company name, get a structured report with a 0-100 composite score and one of four deal ratings — STRONG_BUY, DILIGENCE, WATCH, or PASS — in under three minutes. Every run is identical and auditable, so your whole team can compare notes on the same structured output rather than arguing over whose spreadsheet is current.

  • Scheduling — run the same company monthly to track hiring velocity, patent filings, and competitive positioning over time
  • API access — trigger due diligence runs from your deal flow CRM, Notion database, or internal tooling via Python, JavaScript, or HTTP
  • Proxy rotation — all sub-actors use Apify's built-in infrastructure, no IP management required
  • Monitoring — set up Slack or email alerts when a run fails or a deal rating changes on a monitored company
  • Integrations — push deal memos directly to Google Sheets, HubSpot, Zapier, or Make for pipeline tracking

Features

  • 8 data sources queried in parallel — OpenCorporates, USPTO, EPO, GitHub, website tech stack, job market, ArXiv, and SaaS competitive intel run simultaneously via Promise.allSettled(), keeping total runtime under three minutes even if one sub-source fails
  • Resilient sub-actor orchestration — if any single data source is unavailable, the actor logs a warning and continues scoring with the remaining sources rather than failing the entire run
  • Innovation Velocity model (30% weight) — USPTO patent portfolio scores up to 25 points; GitHub activity combines repo count (max 15) and star count on a logarithmic scale (max 15); ArXiv publications score up to 25 points; recency bonus adds up to 20 points for activity in the last 6 months
  • Competitive Moat model (25% weight) — tech stack depth scores up to 25 points; patent protection scores up to 30 points; competitor count is inverted (fewer competitors = higher score, max 25); GitHub star community adds up to 20 points via log-scale network effect calculation
  • Hiring Signal Decoder (25% weight) — classifies every job title into engineering, sales, marketing, executive, operations, or other using 30+ keyword patterns; infers company strategy as BUILDING, SCALING, PIVOTING, or MAINTAINING from role distribution ratios
  • Corporate Health Check (20% weight) — validates entity existence via OpenCorporates; penalizes dissolved or inactive entities; scores multi-jurisdiction structure from 1-3 jurisdictions (clean) to 6+ jurisdictions (complex/concerning)
  • Composite scoring formula — Innovation 30% + Moat 25% + Hiring 25% + Corporate 20% — produces a 0-100 score with four deal ratings: STRONG_BUY (75+), DILIGENCE (50-74), WATCH (25-49), PASS (0-24)
  • Automatic investment thesis generation — bullish signals (HYPERGROWTH velocity, FORTRESS moat, BUILDING hiring) are extracted and written as plain-English thesis statements
  • Automatic red flag extraction — POOR corporate health, DORMANT innovation, NONE/WEAK moat, and 3+ dissolved entities generate named red flags for immediate review
  • Sector-aware search — providing a sector refines GitHub and ArXiv queries to reduce false positives for common company names
  • Domain-based tech stack detection — provide the company's website URL to get precise tech stack depth analysis instead of a generic name search
  • Full signal log — every scored data point that crosses a threshold is recorded in allSignals for a complete audit trail

Use cases for startup due diligence

Venture capital deal screening

VC analysts reviewing 50+ inbound deals per month have no time for manual research at the top of the funnel. Run this actor on every inbound company. Use the composite score to sort your pipeline: STRONG_BUY and DILIGENCE deals get partner meetings, WATCH deals go into a tracking list, PASS deals get a polite decline. The hiring signal decoder tells you whether a company is still building product (BUILDING) or already at go-to-market (SCALING), which directly informs whether the deal fits your stage thesis.

Angel investor pre-meeting research

Before a 30-minute founder call, run a due diligence report. Arrive knowing the patent count, GitHub star trajectory, number of open engineering roles, and whether the corporate structure looks clean. The investment thesis points give you conversation starters; the red flags give you pointed questions. Research that used to take two hours happens in the time it takes to get coffee.

Accelerator and incubator cohort selection

Program managers evaluating 200+ applications need a repeatable scoring method that doesn't require reading every deck. Batch-run due diligence reports via the API on every applicant company. Sort by composite score, filter by sector, and focus detailed human review on the top 20% of applicants. The innovation velocity score is particularly useful for identifying research-backed founders who may not write polished decks.

Corporate M&A and strategic partnership screening

Corporate development teams assessing acquisition targets or partnership candidates need structured, comparable data. The competitive moat analysis identifies whether a target company's advantage comes from patents, technical complexity, community network effects, or market positioning — which directly informs deal structuring and valuation discussions.

PE growth equity pre-LOI diligence

Private equity teams evaluating growth-stage companies can use the hiring signal decoder to validate management's claimed growth trajectory. A company claiming aggressive expansion should show SCALING-phase hiring (sales and marketing dominant). A company claiming product-led growth should show BUILDING-phase hiring (engineering dominant). Mismatches between narrative and hiring data are a meaningful diligence signal.

Portfolio monitoring and competitive intelligence

Run monthly reports on your existing portfolio companies and their named competitors. Track how composite scores change over time — a company's innovation velocity dropping from FAST to SLOW, or a competitor filing 10 new patents, is an early warning signal worth acting on.

How to run startup due diligence on a company

  1. Enter the company name — type the startup's name exactly as it appears in corporate registrations (e.g., "Figma", "Notion", "Rippling"). This is the only required field.
  2. Configure options — optionally add the company's website URL (e.g., https://figma.com) for precise tech stack analysis, and a sector keyword (e.g., "design software", "HR tech") to reduce false positives in research and patent searches.
  3. Run the actor — click "Start" and wait approximately 2-3 minutes while all 8 data sources are queried in parallel.
  4. Download results — your deal memo appears in the Dataset tab. Export as JSON for API workflows, CSV for spreadsheet analysis, or Excel for investor reporting.

Input parameters

ParameterTypeRequiredDefaultDescription
companyNamestringYesThe startup or company name to analyze (e.g., "Stripe", "Figma", "Notion")
domainstringNoCompany website URL for tech stack detection (e.g., "https://stripe.com"). Omit for name-based detection.
sectorstringNoSector or industry to refine GitHub and ArXiv search results (e.g., "fintech", "AI", "healthcare")

Input examples

Standard due diligence run — company name only:

{
  "companyName": "Notion"
}

Full diligence with domain and sector:

{
  "companyName": "Rippling",
  "domain": "https://rippling.com",
  "sector": "HR tech"
}

Minimal fastest run — name only, no optional fields:

{
  "companyName": "Figma"
}

Input tips

  • Always use the legal entity name — for corporate registration matching, use the registered company name rather than a product name or brand nickname. For example, "Figma Inc" may return better corporate health results than "Figma".
  • Add the domain for accurate tech stack analysis — without a domain URL, the actor attempts tech detection by company name query, which may return imprecise results. For any company with a public website, provide the URL.
  • Use sector to disambiguate common names — companies named "Atlas", "Beacon", or "Mercury" exist in many industries. Adding "fintech" or "logistics" narrows GitHub and ArXiv searches to relevant results.
  • Batch via API for pipeline screening — processing 20 deal flow companies as 20 sequential API calls is more efficient than running them one by one in the UI.
  • Treat WATCH-rated companies as async monitoring candidates — schedule monthly re-runs on WATCH-rated companies to catch when a hiring surge or patent filing pushes them into DILIGENCE territory.

Output example

{
  "query": "Rippling",
  "compositeScore": 74,
  "dealRating": "DILIGENCE",
  "allSignals": [
    "12 patents filed (9 USPTO, 3 EPO)",
    "3800 GitHub stars across 22 repos — strong developer community",
    "22 public repositories — active open source presence",
    "4 ArXiv publications — research-driven innovation",
    "58% engineering hires — product building phase",
    "17 open positions — significant growth",
    "11 technology components detected — complex tech stack",
    "12 patents providing IP protection",
    "Only 5 direct competitors identified — limited competition",
    "3800 GitHub stars — developer community network effect",
    "2 corporate registration(s) found"
  ],
  "recommendations": [
    "High innovation velocity (81/100) — strong R&D output",
    "STRONG competitive moat: Technical complexity, Patent portfolio, Market positioning",
    "Engineering-heavy hiring — product building phase (ideal for early-stage)"
  ],
  "dataSources": {
    "corporateRegistrations": 2,
    "usptoPatents": 9,
    "epoPatents": 3,
    "githubRepos": 22,
    "techStackComponents": 11,
    "jobListings": 17,
    "arxivPapers": 4,
    "saasCompetitors": 5
  },
  "generatedAt": "2026-03-20T09:14:33.821Z",
  "inputDomain": "https://rippling.com",
  "inputSector": "HR tech",
  "innovationVelocity": {
    "score": 81,
    "patentCount": 9,
    "epoPatentCount": 3,
    "githubRepos": 22,
    "githubStars": 3800,
    "arxivPapers": 4,
    "velocityLevel": "FAST",
    "signals": [
      "12 patents filed (9 USPTO, 3 EPO)",
      "3800 GitHub stars across 22 repos — strong developer community",
      "22 public repositories — active open source presence",
      "4 ArXiv publications — research-driven innovation"
    ]
  },
  "hiringSignals": {
    "score": 55,
    "totalJobs": 17,
    "engineeringJobs": 10,
    "salesJobs": 3,
    "executiveJobs": 1,
    "strategyInference": "BUILDING",
    "roleDistribution": {
      "engineering": 10,
      "sales": 3,
      "marketing": 2,
      "executive": 1,
      "operations": 1,
      "other": 0
    },
    "signals": [
      "58% engineering hires — product building phase",
      "17 open positions — significant growth"
    ]
  },
  "competitiveMoat": {
    "score": 68,
    "techStackDepth": 11,
    "competitorCount": 5,
    "patentProtection": 12,
    "moatType": "STRONG",
    "moatFactors": [
      "Technical complexity",
      "Patent portfolio",
      "Market positioning",
      "Community/network effects"
    ],
    "signals": [
      "11 technology components detected — complex tech stack",
      "12 patents providing IP protection",
      "Only 5 direct competitors identified — limited competition",
      "3800 GitHub stars — developer community network effect"
    ]
  },
  "corporateHealth": {
    "score": 90,
    "entityCount": 2,
    "activeEntities": 2,
    "inactiveEntities": 0,
    "jurisdictions": ["us_de", "gb"],
    "healthLevel": "STRONG",
    "signals": [
      "2 corporate registration(s) found"
    ]
  },
  "investmentThesis": [
    "High innovation velocity (81/100) — strong R&D output",
    "STRONG competitive moat: Technical complexity, Patent portfolio, Market positioning",
    "Engineering-heavy hiring — product building phase (ideal for early-stage)"
  ],
  "redFlags": []
}

Output fields

FieldTypeDescription
querystringThe company name used for the report
compositeScorenumberWeighted composite score from 0-100
dealRatingstringSTRONG_BUY, DILIGENCE, WATCH, or PASS
allSignalsstring[]All scored signals that crossed detection thresholds
recommendationsstring[]Combined investment thesis points and red flags
dataSourcesobjectRecord count from each of the 8 data sources
dataSources.corporateRegistrationsnumberOpenCorporates entity count
dataSources.usptoPatentsnumberUS patent filings found
dataSources.epoPatentsnumberEuropean patent filings found
dataSources.githubReposnumberGitHub repositories found
dataSources.techStackComponentsnumberWebsite technology components detected
dataSources.jobListingsnumberOpen job postings found
dataSources.arxivPapersnumberArXiv research papers found
dataSources.saasCompetitorsnumberSaaS competitors identified
generatedAtstringISO 8601 timestamp of report generation
inputDomainstringDomain URL provided (null if omitted)
inputSectorstringSector provided (null if omitted)
innovationVelocity.scorenumberInnovation Velocity sub-score (0-100)
innovationVelocity.patentCountnumberUSPTO patents found
innovationVelocity.epoPatentCountnumberEPO patents found
innovationVelocity.githubReposnumberGitHub repos found
innovationVelocity.githubStarsnumberTotal GitHub stars across all repos
innovationVelocity.arxivPapersnumberArXiv papers found
innovationVelocity.velocityLevelstringHYPERGROWTH, FAST, MODERATE, SLOW, or DORMANT
innovationVelocity.signalsstring[]Threshold-crossing innovation signals
hiringSignals.scorenumberHiring Signal sub-score (0-100)
hiringSignals.totalJobsnumberTotal open positions found
hiringSignals.engineeringJobsnumberEngineering/technical roles
hiringSignals.salesJobsnumberSales/revenue roles
hiringSignals.executiveJobsnumberExecutive/leadership roles
hiringSignals.strategyInferencestringBUILDING, SCALING, PIVOTING, or MAINTAINING
hiringSignals.roleDistributionobjectCount breakdown by role category
hiringSignals.signalsstring[]Hiring pattern signals
competitiveMoat.scorenumberCompetitive Moat sub-score (0-100)
competitiveMoat.techStackDepthnumberNumber of technology components detected
competitiveMoat.competitorCountnumberDirect SaaS competitors identified
competitiveMoat.patentProtectionnumberTotal USPTO + EPO patents
competitiveMoat.moatTypestringFORTRESS, STRONG, MODERATE, WEAK, or NONE
competitiveMoat.moatFactorsstring[]Named moat factors contributing to score
competitiveMoat.signalsstring[]Moat-specific scored signals
corporateHealth.scorenumberCorporate Health sub-score (0-100)
corporateHealth.entityCountnumberTotal corporate entities found
corporateHealth.activeEntitiesnumberActive/good-standing entities
corporateHealth.inactiveEntitiesnumberDissolved or inactive entities
corporateHealth.jurisdictionsstring[]Jurisdiction codes (e.g., us_de, gb, ie)
corporateHealth.healthLevelstringSTRONG, GOOD, ACCEPTABLE, CONCERNING, or POOR
corporateHealth.signalsstring[]Corporate structure signals
investmentThesisstring[]Bullish thesis points auto-generated from scoring
redFlagsstring[]Bearish warning points auto-generated from scoring

How much does it cost to run startup due diligence?

Startup Due Diligence uses pay-per-run pricing — each run calls 8 sub-actors in parallel and costs approximately $0.25-$0.60 in platform credits depending on how much data each source returns. Compute costs are included.

ScenarioCompaniesApprox. cost per runTotal cost
Quick test1$0.30$0.30
Small batch5$0.35$1.75
Deal flow screen20$0.40$8.00
Portfolio sweep50$0.40$20.00
Enterprise pipeline200$0.45$90.00

You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.

Manual analyst time for equivalent research runs $75-$150/hour. A typical 8-source manual diligence takes 2-4 hours per company — $150-$600 per company. This actor produces comparable structured output for under $0.50. Most teams spend $20-$50/month covering their full active pipeline.

Startup due diligence using the API

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/startup-due-diligence").call(run_input={
    "companyName": "Rippling",
    "domain": "https://rippling.com",
    "sector": "HR tech"
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"Company: {item['query']}")
    print(f"Score: {item['compositeScore']}/100 — Rating: {item['dealRating']}")
    print(f"Innovation: {item['innovationVelocity']['velocityLevel']} ({item['innovationVelocity']['score']}/100)")
    print(f"Moat: {item['competitiveMoat']['moatType']} ({item['competitiveMoat']['score']}/100)")
    print(f"Strategy: {item['hiringSignals']['strategyInference']}")
    print(f"Thesis: {item['investmentThesis']}")
    print(f"Red Flags: {item['redFlags']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/startup-due-diligence").call({
    companyName: "Rippling",
    domain: "https://rippling.com",
    sector: "HR tech"
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`${item.query}: ${item.compositeScore}/100 — ${item.dealRating}`);
    console.log(`Innovation: ${item.innovationVelocity.velocityLevel}`);
    console.log(`Moat: ${item.competitiveMoat.moatType}`);
    console.log(`Strategy: ${item.hiringSignals.strategyInference}`);
    console.log(`Thesis:`, item.investmentThesis);
    console.log(`Red Flags:`, item.redFlags);
}

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~startup-due-diligence/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "companyName": "Rippling",
    "domain": "https://rippling.com",
    "sector": "HR tech"
  }'

# Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How Startup Due Diligence works

Phase 1 — Parallel data collection

The actor calls 8 sub-actors simultaneously using Promise.allSettled(), which means all 8 sources are queried at the same time rather than sequentially. Each sub-actor is allocated 256MB of memory and a 120-second timeout. The runActorsParallel() function handles partial failure gracefully: if any sub-actor returns an error, the result for that source is set to an empty array, and scoring continues with the remaining data. Total collection time is bounded by the slowest single source, not the sum of all sources.

Phase 2 — Four independent scoring models

Each scoring model operates on the raw data arrays returned by its relevant sub-actors:

Innovation Velocity reads from USPTO patents (max 25 pts), EPO patents (included in USPTO total), GitHub repos (max 15 pts for count) and GitHub stars on a log2 scale (max 15 pts), and ArXiv papers (max 25 pts). A recency bonus of up to 20 points is added for GitHub repos and ArXiv papers updated within the last 6 months. Score thresholds: HYPERGROWTH (80+), FAST (60-79), MODERATE (40-59), SLOW (20-39), DORMANT (0-19).

Hiring Signal Decoder classifies job titles using 30+ keyword patterns spread across 6 categories: engineering (engineer, developer, devops, ml, sre, platform, etc.), sales (account executive, bdr, sdr, revenue, etc.), marketing (growth, demand gen, product marketing, etc.), executive (cto, vp, director, head of, chief, etc.), operations, and other. Role distribution ratios determine strategy inference: engineering ratio ≥ 50% = BUILDING, sales ratio ≥ 40% = SCALING, executive ratio ≥ 30% = PIVOTING.

Competitive Moat uses an inverted competitor count score — fewer identified SaaS competitors increases the score (max 25 pts), while 8+ competitors apply a full penalty. Tech stack complexity scores up to 25 pts. Patent protection scores up to 30 pts. GitHub stars use a log2(stars) * 3 network effect formula capped at 20 pts.

Corporate Health checks OpenCorporates entity status strings for keywords: "active", "good standing", "live" count as active; "dissolv", "inactive", "struck", "revoked" count as inactive. Clean structures of 1-3 entities in 1-3 jurisdictions score highest. Complex structures with 5+ entities or 6+ jurisdictions are penalized.

Phase 3 — Composite scoring and deal memo generation

The composite score applies weighted averages: Innovation × 0.30 + Moat × 0.25 + Hiring × 0.25 + Corporate × 0.20. Threshold-crossing signals from each model are merged into a single allSignals array. Bullish conditions (HYPERGROWTH velocity, FORTRESS/STRONG moat, BUILDING/SCALING hiring) generate investment thesis statements. Bearish conditions (POOR/CONCERNING corporate health, DORMANT innovation, NONE/WEAK moat, 3+ dissolved entities) generate named red flags. The final report is pushed to the Apify dataset as a single structured JSON record.

Tips for best results

  1. Provide the domain URL whenever possible. Tech stack analysis without a domain falls back to a name-based query which can match unrelated companies. The domain URL gives the Website Tech Stack Detector an exact target, producing accurate component counts that feed into the competitive moat score.

  2. Use sector to reduce false positives. The company name "Mercury" appears in fintech, healthcare, aerospace, and retail. Adding "sector": "fintech" narrows GitHub and ArXiv queries to relevant technical output, producing a more accurate innovation velocity score.

  3. Schedule monthly monitoring runs on WATCH-rated companies. A company rated WATCH at Series A may hit DILIGENCE or STRONG_BUY six months later after a product launch or fundraise. Scheduling a monthly run costs under $5/month per company and surfaces these transitions automatically.

  4. Cross-reference hiring strategy with the company's stated narrative. If a founder claims "we're scaling revenue aggressively" but the hiring decoder shows BUILDING (70%+ engineering), that's a meaningful discrepancy worth exploring in diligence calls.

  5. Run competitor names through the same actor. Use the saasCompetitors field from one run to feed the next — running due diligence on named competitors gives you a comparable scoring matrix for relative positioning analysis.

  6. Export to Google Sheets for cohort comparisons. The structured JSON output maps cleanly to spreadsheet columns. A 20-company accelerator cohort produces a sortable, comparable scoring table in minutes.

  7. Combine with Company Deep Research for narrative depth. This actor produces quantitative scores; Company Deep Research adds qualitative narrative intelligence. Run both for investment committee materials.

Combine with other Apify actors

ActorHow to combine
Company Deep ResearchUse this actor for quantitative scoring, then run Company Deep Research on DILIGENCE and STRONG_BUY rated companies for narrative intelligence reports
Website Tech Stack DetectorRun independently on the company domain for a more detailed technology breakdown than the summary count included in the composite score
B2B Lead QualifierScore the startup's own customers as leads — useful for portfolio companies building outbound pipelines
Trustpilot Review AnalyzerPull customer sentiment data on the target company and its competitors as a qualitative complement to the quantitative moat score
Job Market IntelligencePull a deeper breakdown of the hiring trends beyond the high-level role categories scored here
WHOIS Domain LookupVerify domain registration age and ownership details as an additional corporate legitimacy signal
Website Contact ScraperExtract leadership team contact details from the company's website after identifying a STRONG_BUY target

Limitations

  • Data availability varies by company maturity. Early pre-launch startups with no GitHub repos, no job postings, and no patents will score low across all models. A low score for a very early company may reflect data absence rather than company quality. Use the dataSources counts to assess coverage.
  • Corporate registration coverage is strongest for US and UK entities. OpenCorporates has broad global coverage, but some jurisdictions (private Asian markets, certain European countries) may return no results even for legitimate companies. A zero corporate health entity count does not always indicate a shell company.
  • Patent searches match by keyword, not by exact assignee. Companies with common names may return patents from unrelated organizations. The sector input parameter narrows searches but does not guarantee perfect patent attribution. Verify high patent counts manually for ambiguous company names.
  • GitHub repo matching is keyword-based. The actor finds repos where the company name appears in the query — it does not access a verified mapping of GitHub organizations to legal entities. Repos from similarly named projects may inflate star counts.
  • Hiring data depends on job board coverage. The job market intelligence source aggregates from public job boards, but some companies post exclusively on their own career pages or internal systems. A low hiring signal score may mean light job board usage, not low hiring.
  • Competitor identification is bounded by SaaS directory coverage. The competitive moat score uses the count of identified competitors, which depends on competitor data availability. Niche B2B markets may show fewer competitors than actually exist, artificially inflating the moat score.
  • All data is point-in-time. The report reflects data available at the moment of the run. Patent filings, job postings, and GitHub activity change daily. Schedule recurring runs for monitoring rather than relying on a single historical report.
  • No financial data. This actor does not access revenue, EBITDA, burn rate, cap table, or investor data. It is not a substitute for financial due diligence. Combine with SEC EDGAR data (for public companies) or direct data room review for financial analysis.

Integrations

  • Zapier — trigger a due diligence run automatically when a new company is added to your deal flow CRM or Airtable, then post the deal rating to a Slack channel
  • Make — build multi-step automations that run diligence, filter by deal rating, and route STRONG_BUY results to partner review workflows
  • Google Sheets — export deal memos to a shared spreadsheet for cohort comparison; composite scores map cleanly to sortable columns
  • Apify API — integrate directly into deal flow platforms, Notion databases, or internal tools using the Python or JavaScript client
  • Webhooks — receive a POST notification with the full report JSON when a run completes, enabling real-time pipeline updates
  • LangChain / LlamaIndex — feed deal memos into an LLM pipeline for natural language Q&A over your startup portfolio data

Troubleshooting

  • Low composite score despite a well-known company — Common company names (Mercury, Atlas, Beacon) may match unrelated patents, GitHub repos, or corporate entities. Add the sector parameter to focus searches. Also check the dataSources counts: if usptoPatents is 0 and githubRepos is 0, the company name may not be matching correctly in those systems.

  • Corporate health score is 0 or very low — OpenCorporates may not have data for companies incorporated in certain jurisdictions or very recently formed entities. A low corporate health score with 0 entity count does not necessarily indicate fraud — verify manually via the relevant national business registry.

  • Run takes longer than 3 minutes — One or more sub-actors is hitting its 120-second timeout. This typically happens when job market intelligence or SaaS competitive intel searches return large result sets. The actor will still complete with partial data. Check the dataSources counts in the output to see which sources returned zero results.

  • investmentThesis is empty despite a high composite score — Thesis points are only generated when specific thresholds are crossed: HYPERGROWTH or FAST velocity, FORTRESS or STRONG moat, BUILDING or SCALING strategy inference. A high composite score assembled from many moderate sub-scores may not trigger any individual thesis statement. Check each sub-score directly.

  • Red flags appear for a company you know to be legitimate — Dissolved entities in the corporate health check sometimes represent normal restructuring events (e.g., renaming the entity, jurisdiction change). Use the corporateHealth.jurisdictions and inactiveEntities fields to investigate specific registrations via OpenCorporates directly.

Responsible use

  • This actor only accesses publicly available data from open registries, patent databases, GitHub, job boards, and research archives.
  • Respect the terms of service of all underlying data sources.
  • Due diligence data should be used for legitimate investment research and business evaluation purposes only.
  • Do not use company intelligence data to harass founders, manipulate markets, or engage in unauthorized corporate surveillance.
  • For guidance on web scraping legality, see Apify's guide.

FAQ

How many companies can I screen with startup due diligence in one month?

On Apify's free tier ($5/month in credits), you can run approximately 10-15 due diligence reports depending on how much data each company returns. On a $49/month plan, you can screen 80-150 companies. Enterprise plans support unlimited pipeline volume.

How accurate is the composite score for startup due diligence?

The score reflects the volume and recency of publicly observable signals — patents, GitHub activity, hiring, and corporate registrations. It is accurate for signal detection but not a substitute for financial diligence. A company with DORMANT innovation velocity may still be a strong business if it operates in a non-technical sector where patent and GitHub signals are structurally absent (e.g., services businesses).

Does startup due diligence work for non-US companies?

Yes. EPO patent data covers European companies, OpenCorporates covers 140+ jurisdictions, and GitHub and ArXiv are global. However, corporate registration coverage varies by country, and job board coverage skews toward English-language postings. Results may be less complete for companies operating primarily in non-English-speaking markets.

How is startup due diligence different from hiring a research analyst?

An analyst produces richer qualitative insight but costs $75-$150/hour and takes 2-4 hours per company. This actor produces structured, quantitative, comparable data in under 3 minutes for under $0.50. It is best used to pre-screen a large pipeline and prioritize which companies receive full analyst attention, not to replace analysts entirely.

What does the BUILDING vs SCALING strategy inference mean?

BUILDING means 50%+ of open roles are engineering or technical positions — the company is investing primarily in product development. SCALING means 40%+ of roles are in sales, business development, or revenue — the company has product-market fit and is investing in go-to-market. PIVOTING means 30%+ of roles are executive or leadership hires, which often signals a leadership restructuring or strategic pivot. MAINTAINING means no dominant hiring pattern.

Can I run startup due diligence on a company that has no website?

Yes. The domain field is optional. Without it, the actor will attempt tech stack detection via name-based query, though results may be less precise. All other 7 data sources (patents, GitHub, jobs, ArXiv, corporate registrations, competitors) do not require a domain URL.

Is it legal to scrape startup data for due diligence purposes?

Yes. This actor exclusively accesses publicly available data from open registries (OpenCorporates, USPTO, EPO), public platforms (GitHub, ArXiv), and public job boards. No private, authenticated, or paywalled data is accessed. Investment research using public data is standard practice in the industry. For guidance, see Apify's web scraping legality guide.

How does the competitive moat score handle a startup with no GitHub presence?

GitHub stars contribute a maximum of 20 points to the moat score via a log2(stars) * 3 formula. A company with zero GitHub repos scores 0 on that dimension but can still achieve MODERATE or STRONG moat ratings through patent protection (max 30 pts) and market positioning (max 25 pts). The moat model does not require GitHub activity.

What triggers a STRONG_BUY deal rating?

A composite score of 75 or above. This requires strong performance across multiple dimensions — for example, FAST innovation velocity (score ~75), STRONG competitive moat (score ~70), active hiring in a clear strategic direction, and clean corporate structure. In practice, STRONG_BUY ratings go to well-established companies with strong public IP and developer communities, not typical seed-stage startups.

Can I schedule this actor to monitor a startup over time?

Yes. Use the Apify scheduler to run the actor on the same company monthly or quarterly. Each run produces a new dataset record with a generatedAt timestamp, so you can track how composite scores, hiring strategies, and patent counts evolve over time. Schedule alerts on deal rating changes via Apify webhooks.

How does startup due diligence compare to PitchBook or Crunchbase?

PitchBook and Crunchbase provide funding history, investor networks, and revenue estimates — data this actor does not have. This actor provides patent analysis, GitHub activity, real-time hiring signals, tech stack complexity, and corporate structure health — data those platforms largely do not provide. They are complementary tools: use PitchBook for funding context, use this actor for operational and innovation signal analysis.

What happens if one of the 8 data sources fails during a run?

The actor uses Promise.allSettled() internally, which means a failure in one sub-actor does not abort the run. The failing source returns an empty array, the actor logs a warning, and scoring continues with the remaining 7 sources. The dataSources counts in the output will show 0 for any source that failed, making it easy to identify coverage gaps.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom scoring models, sector-specific configurations, or enterprise integrations with deal flow platforms, reach out through the Apify platform.

How it works

01

Configure

Set your parameters in the Apify Console or pass them via API.

02

Run

Click Start, trigger via API, webhook, or set up a schedule.

03

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Ready to try Startup Due Diligence?

Start for free on Apify. No credit card required.

Open on Apify Store