AIDEVELOPER TOOLS

Knowledge Graph Causal Discovery MCP

Knowledge graph causal discovery over multi-domain research data, delivered through a single Model Context Protocol interface. This MCP server is built for researchers, data scientists, and AI agents that need to go beyond correlation — discovering directed causal structure, estimating treatment effects, and reasoning about counterfactuals from the published literature and public datasets.

Try on Apify Store
$0.08per event
0
Users (30d)
0
Runs (30d)
90
Actively maintained
Maintenance Pulse
$0.08
Per event

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

discover-causal-structures
Estimated cost:$8.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
discover-causal-structureFCI constraint-based causal skeleton discovery$0.08
compute-interventional-effectsDo-calculus with ID algorithm$0.10
simulate-counterfactualsTwin network structural method$0.08
extract-causal-claims-literatureNLP causal extraction from literature$0.06
embed-causal-knowledge-graphRotatE complex-space embedding$0.06
estimate-causal-effect-tmleTargeted maximum likelihood estimation$0.08
check-graph-consistencySheaf cohomology H1 obstruction detection$0.08
attribute-source-contributionShapley cooperative game attribution$0.06

Example: 100 events = $8.00 · 1,000 events = $80.00

Connect to your AI agent

Add this MCP server to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

MCP Endpoint
https://ryanclinton--knowledge-graph-causal-discovery-mcp.apify.actor/mcp
Claude Desktop Config
{
  "mcpServers": {
    "knowledge-graph-causal-discovery-mcp": {
      "url": "https://ryanclinton--knowledge-graph-causal-discovery-mcp.apify.actor/mcp"
    }
  }
}

Documentation

Knowledge graph causal discovery over multi-domain research data, delivered through a single Model Context Protocol interface. This MCP server is built for researchers, data scientists, and AI agents that need to go beyond correlation — discovering directed causal structure, estimating treatment effects, and reasoning about counterfactuals from the published literature and public datasets.

The server orchestrates 17 Apify actors in parallel across five source domains — academic, biomedical, regulatory, economic, and safety — assembling the results into a unified causal knowledge graph. Eight specialized tools then apply rigorous causal inference algorithms: FCI skeleton learning, GES with BIC scoring, Pearl's do-calculus with the ID algorithm, twin network counterfactuals, TMLE estimation, RotatE knowledge graph embeddings, sheaf cohomology consistency checking, and Shapley source attribution. Every tool call returns structured JSON with mathematical scores and supporting evidence.

⬇️ What data can you access?

Data PointSourceCoverage
📄 Academic papers and citationsOpenAlex, Semantic Scholar, Crossref250M+ scholarly works with citation graphs
📑 Preprints and open accessarXiv, COREPhysics, CS, quantitative biology, math
🧬 Biomedical literaturePubMed36M+ citations with MeSH indexing
🏥 Clinical trialsClinicalTrials.gov450K+ registered studies with protocol data
💊 Drug adverse event reportsOpenFDAFDA FAERS pharmacovigilance database
🔬 NIH research grantsNIH ReporterActive and historical funded projects
📜 Federal regulationsFederal RegisterUS regulatory actions and proposed rules
🏛️ Congressional legislationCongress.govBills, resolutions, and amendments
🗂️ Government datasetsData.gov300K+ federal open data assets
📈 Economic time seriesFREDFederal Reserve GDP, inflation, employment
🌍 World development indicatorsWorld Bank200+ country development metrics
⚠️ Product recall noticesCPSCConsumer product safety recall database
💬 Consumer complaintsCFPBFinancial protection complaint records
📖 Encyclopedia contextWikipediaBackground knowledge and concept disambiguation

Why use Knowledge Graph Causal Discovery MCP?

Assembling a causal inference pipeline from scratch requires integrating a dozen data sources, implementing graph construction logic, and coding algorithms that span three decades of academic literature. A typical research team spending a week on this still ends up with a pipeline that covers two or three data domains at best.

This MCP server covers 17 data sources, applies 10 peer-reviewed causal algorithms, and returns structured results in seconds — directly inside Claude, Cursor, Windsurf, or any MCP-compatible AI client.

  • Always-live data — every tool call fetches fresh results from source APIs; no stale snapshots or cached indexes
  • Parallel execution — up to 17 actors run simultaneously per query, not sequentially, so response time scales with the slowest source rather than the sum
  • Standby mode — the server stays warm between calls, eliminating cold-start latency for interactive research sessions
  • Pay-per-call — no monthly subscription; each tool costs between $0.035 and $0.050, so a full 8-tool pipeline costs under $0.35
  • MCP-native — works in Claude Desktop, Cursor, Windsurf, Cline, and any client that speaks the Model Context Protocol

⬆️ MCP tools

ToolPriceAlgorithmBest for
discover_causal_structure$0.045FCI + GES + additive noise modelInitial causal graph structure from observational data
compute_interventional_effects$0.050Pearl's do-calculus + ID algorithm + Balke-Pearl LPPolicy evaluation, treatment planning, intervention design
simulate_counterfactuals$0.045Twin network method + Tian-Pearl bounds"What if" analysis, legal causation, necessity/sufficiency
extract_causal_claims_literature$0.035NLP pattern matching + evidence classificationSystematic reviews, evidence synthesis, claim auditing
embed_causal_knowledge_graph$0.040RotatE complex-valued embeddingsLink prediction, entity similarity, pathway discovery
estimate_causal_effect_tmle$0.050TMLE + Super Learner ensemble + influence function CISemiparametric ATE estimation with doubly-robust CI
check_graph_consistency$0.035Sheaf cohomology H¹(G,F)Validating causal assumptions, identifiability checks
attribute_source_contribution$0.040Shapley values + nucleolus + core stabilityData source prioritization, budget allocation

Use cases for knowledge graph causal discovery

Drug safety signal detection

Pharmacovigilance teams combine PubMed biomedical literature, ClinicalTrials.gov outcome data, and FDA adverse event reports into a single causal graph. The discover_causal_structure tool identifies directed edges between compounds and adverse outcomes. The compute_interventional_effects tool estimates P(adverse event | do(prescribe drug)) using back-door adjustment on confounders sourced from NIH grant data and OpenAlex citations.

Policy impact assessment

Policy analysts estimate causal effects of regulatory interventions on economic outcomes by combining Federal Register rules, FRED economic time series, and World Bank development indicators. The estimate_causal_effect_tmle tool applies TMLE with Super Learner to produce doubly-robust average treatment effect estimates with 95% confidence intervals from the influence function — going beyond naive before/after comparison.

Systematic review and evidence synthesis

Literature reviewers use extract_causal_claims_literature to scan thousands of academic papers across OpenAlex, Semantic Scholar, Crossref, arXiv, and CORE simultaneously. Claims are classified by strength (strong/moderate/weak/correlational) and evidence level (RCT/observational/case study/review). Conflicting claims across sources are flagged automatically, replacing weeks of manual screening.

Counterfactual reasoning for legal and regulatory causation

Legal teams and regulators assessing causation in product liability or pharmaceutical harm cases use simulate_counterfactuals to compute the Probability of Necessity (PN = P(Y_x'=0 | X=x, Y=y)) and Probability of Sufficiency (PS = P(Y_x=1 | X=x', Y=0)) via the twin network method. Tian-Pearl monotonicity bounds are validated to constrain the counterfactual probabilities.

Knowledge graph completion in biomedical AI

AI research teams use embed_causal_knowledge_graph to generate RotatE complex-valued entity embeddings where relations are unit-modulus rotations in complex space (t = h · r, |r_i| = 1). MRR and Hits@10 link prediction metrics identify missing drug-disease or gene-pathway edges. Self-adversarial negative sampling with margin gamma ensures high-quality embeddings even in sparse graph regions.

Data acquisition prioritization

Research operations teams with limited budgets use attribute_source_contribution to calculate Shapley values for each data domain (academic, biomedical, regulatory, economic, safety). The Shapley allocation phi_i quantifies each source's marginal contribution to causal graph quality across all subsets. Nucleolus computation and core non-emptiness check confirm allocation stability before committing to data subscriptions.

How to connect this MCP server

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "knowledge-graph-causal-discovery": {
      "url": "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Cursor / Windsurf / Cline

Add the MCP endpoint in your editor's MCP settings panel:

  • Endpoint URL: https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp
  • Authentication: Bearer token with your Apify API token

Python (MCP client)

import anthropic

client = anthropic.Anthropic()

# The MCP server exposes 8 tools — ask Claude to use them
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=4096,
    tools=[{
        "type": "custom",
        "name": "discover_causal_structure",
        # Claude resolves this via the connected MCP server
    }],
    messages=[{
        "role": "user",
        "content": "Discover the causal structure linking smoking exposure to lung cancer outcomes using academic and biomedical sources."
    }],
    mcp_servers=[{
        "url": "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp",
        "authorization_token": "YOUR_APIFY_TOKEN"
    }]
)
print(response.content)

Direct cURL

# Discover causal structure
curl -X POST "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "discover_causal_structure",
      "arguments": {
        "query": "smoking lung cancer mortality",
        "sources": ["academic", "biomedical"]
      }
    },
    "id": 1
  }'

# Estimate treatment effect via TMLE
curl -X POST "https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "estimate_causal_effect_tmle",
      "arguments": {
        "query": "statin therapy cardiovascular mortality reduction",
        "sources": ["academic", "biomedical", "regulatory"]
      }
    },
    "id": 2
  }'

Tool reference

discover_causal_structure

Discovers causal graph structure from observational data using three combined algorithms:

  1. FCI (Fast Causal Inference) — constraint-based skeleton discovery via Kernel Conditional Independence (KCI) tests, tolerant of latent confounders. Builds the CPDAG (completed partially directed acyclic graph) including bidirectional edges for hidden common causes.
  2. GES (Greedy Equivalence Search) — score-based refinement using BIC (Bayesian Information Criterion) to navigate Markov equivalence classes. BIC = log(likelihood) − (k/2) · log(N) where k is the number of free parameters.
  3. Additive noise model — edge orientation via HSIC (Hilbert-Schmidt Independence Criterion) between residuals and cause. If HSIC(e, X) < HSIC(e, Y), the model orients X → Y.

Returns: directed and bidirectional edges, Markov equivalence class size, BIC score, p-values per edge.

Price: $0.045 per call. Calls up to 10 actors for academic + biomedical sources.


compute_interventional_effects

Computes P(Y | do(X)) — the distribution of Y under intervention on X — via Pearl's do-calculus:

  • Rule 1 — insertion/deletion of observations
  • Rule 2 — action/observation exchange
  • Rule 3 — insertion/deletion of actions
  • ID algorithm — systematic identifiability test for interventional queries in semi-Markovian models
  • Back-door criterion — adjustment for observed confounders
  • Front-door criterion — adjustment via mediating variables when confounders are unobserved
  • Balke-Pearl LP bounds — linear programming bounds for effects not identifiable by do-calculus, constraining via observable distributions

Returns: do-effects with adjustment sets, identifiability flags, LP bound intervals.

Price: $0.050 per call.


simulate_counterfactuals

Simulates counterfactual outcomes via the structural twin network method:

  • Constructs a factual world (X=x, Y=y observed) and a counterfactual world (X=x' intervened)
  • Both worlds share the same exogenous variables U (the twin network's key property)
  • Computes Probability of Necessity (PN): P(Y_{x'}=0 | X=x, Y=y)
  • Computes Probability of Sufficiency (PS): P(Y_x=1 | X=x', Y=0)
  • Validates Tian-Pearl monotonicity bounds: PN ≤ P(Y=y|X=x), PS ≤ P(Y=0|X=x')

Returns: PN and PS per outcome pair, twin network size, monotonicity check result.

Price: $0.045 per call.


extract_causal_claims_literature

Extracts and classifies causal claims from academic literature via NLP pattern matching:

  • Claim strength classification: strong / moderate / weak / correlational based on verb and hedge patterns
  • Evidence level classification: RCT / observational / case_study / review based on study design signals in titles and abstracts
  • Conflict detection: flags pairs of sources making opposing claims about the same cause-effect pair

Draws from OpenAlex, Semantic Scholar, Crossref, arXiv, CORE (academic), and PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA (biomedical) depending on selected sources.

Returns: classified claim list with citations, counts by strength and evidence level, conflicting claim pairs.

Price: $0.035 per call.


embed_causal_knowledge_graph

Embeds the causal knowledge graph using RotatE, a complex-valued knowledge graph embedding model:

  • Relations are rotations in complex space: t = h · r where each component satisfies |r_i| = 1 (unit modulus constraint)
  • Scoring function: f(h, r, t) = −||h · r − t|| (L1 norm of the complex residual)
  • Self-adversarial negative sampling — samples negative triples with probability proportional to their current score, weighted by softmax temperature
  • Margin-based loss with margin gamma separating positive and negative triple scores
  • Cluster assignment via k-means over entity embedding norms

Returns: entity embeddings with norms and nearest neighbours, MRR (Mean Reciprocal Rank), Hits@10, cluster labels, phase range.

Price: $0.040 per call.


estimate_causal_effect_tmle

Estimates average treatment effects via TMLE (Targeted Maximum Likelihood Estimation) following the semiparametric efficiency pipeline:

  1. Initial estimate Q⁰(A, W) via Super Learner ensemble (weighted cross-validated learner combination)
  2. Propensity score g(A | W) with positivity truncation at [0.01, 0.99] to prevent near-deterministic treatment
  3. Clever covariate H(A, W) = A/g(1|W) − (1−A)/g(0|W)
  4. Targeting step — fit epsilon via MLE of logistic model indexed by H, updating Q⁰
  5. Updated estimate Q*(A, W) = expit(logit(Q⁰) + epsilon · H)
  6. ATE = E[Q*(1, W)] − E[Q*(0, W)] (plug-in estimator from targeted fit)
  7. Influence function IC(O) for 95% Wald confidence interval: ATE ± 1.96 · SE(IC)

Returns: ATE per treatment-outcome pair, standard error, 95% CI, influence function norm, cross-validated risk, Super Learner weights.

Price: $0.050 per call.


check_graph_consistency

Checks causal graph consistency using sheaf cohomology over the graph structure:

  • Sheaf F on graph G assigns vector spaces F(v) to vertices and linear maps F(e) to edges
  • Coboundary operator: (δ₀s)(e) = F(e)(s(v)) − s(w) measures local section disagreement
  • H¹(G, F) = ker(δ₁) / im(δ₀) — first cohomology group dimension measures obstructions to global consistency
  • Separate checks: acyclicity (no directed cycles), faithfulness (no spurious independencies), causal sufficiency (no hidden common causes), instrument validity (exclusion restriction), positivity (treatment overlap), Markov compatibility (observed independencies match graph)

Returns: pass/fail per check with violation counts, H¹ cohomology dimension, global section existence flag.

Price: $0.035 per call.


attribute_source_contribution

Attributes each data source's contribution to causal graph quality via cooperative game theory:

  • Each data domain (academic, biomedical, regulatory, economic, safety) is a player in the coalition game
  • Value function v(S) = quality of causal graph (node count · edge density · mean edge weight) using only sources in coalition S
  • Shapley value phi_i = Σ_S [|S|!(n−|S|−1)!/n!] · [v(S ∪ {i}) − v(S)] — fair marginal contribution
  • Nucleolus — lexicographically minimises the maximum excess, finding the most stable payoff allocation
  • Core non-emptiness check — tests whether the Shapley allocation is stable against all coalitional deviations

Best used with all five source categories to get meaningful attribution across the full coalition space.

Returns: Shapley values per source, marginal contributions, nucleolus allocation, core stability flag.

Price: $0.040 per call.

Output examples

discover_causal_structure — smoking and lung cancer

{
  "nodeCount": 94,
  "edgeCount": 187,
  "relations": [
    {
      "cause": "Cigarette smoking and lung adenocarcinoma risk: a pooled analysis",
      "effect": "Lung cancer incidence in never-smokers vs. ever-smokers cohort",
      "edgeType": "causes",
      "strength": 0.74,
      "pValue": 0.003,
      "method": "FCI-KCI"
    },
    {
      "cause": "KRAS mutation frequency in tobacco-exposed lung tissue",
      "effect": "Non-small-cell lung carcinoma progression",
      "edgeType": "causes",
      "strength": 0.61,
      "pValue": 0.011,
      "method": "GES-BIC"
    },
    {
      "cause": "Secondhand smoke exposure biomarker cotinine",
      "effect": "Lung cancer incidence in never-smokers vs. ever-smokers cohort",
      "edgeType": "bidirectional",
      "strength": 0.43,
      "pValue": 0.048,
      "method": "additive-noise-HSIC"
    }
  ],
  "totalEdges": 187,
  "directedEdges": 141,
  "bidirectionalEdges": 46,
  "markovEquivalenceSize": 12,
  "bicScore": -4823.7
}

estimate_causal_effect_tmle — statin therapy and cardiovascular mortality

{
  "nodeCount": 112,
  "estimates": [
    {
      "treatment": "High-intensity statin therapy (atorvastatin 40-80mg)",
      "outcome": "Major adverse cardiovascular events at 5 years",
      "ate": -0.082,
      "standardError": 0.019,
      "confidenceInterval": [-0.119, -0.045],
      "influenceFunctionNorm": 0.041
    },
    {
      "treatment": "High-intensity statin therapy (atorvastatin 40-80mg)",
      "outcome": "All-cause mortality",
      "ate": -0.031,
      "standardError": 0.014,
      "confidenceInterval": [-0.058, -0.004],
      "influenceFunctionNorm": 0.028
    }
  ],
  "significantCount": 2,
  "averageATE": -0.056,
  "crossValidatedRisk": 0.113,
  "superLearnerWeights": {
    "logistic": 0.34,
    "randomForest": 0.41,
    "xgboost": 0.25
  }
}

simulate_counterfactuals — treatment necessity and sufficiency

{
  "nodeCount": 87,
  "outcomes": [
    {
      "factual": "Patient received antihypertensive therapy (X=1), experienced stroke (Y=1)",
      "counterfactual": "Patient did not receive antihypertensive therapy (X=0)",
      "factualValue": 1.0,
      "counterfactualValue": 0.0,
      "probabilityOfNecessity": 0.71,
      "probabilityOfSufficiency": 0.38
    }
  ],
  "twinNetworkSize": 174,
  "averagePN": 0.71,
  "averagePS": 0.38,
  "monotonicityHolds": true
}

check_graph_consistency — causal assumption validation

{
  "nodeCount": 94,
  "edgeCount": 187,
  "checks": [
    { "check": "acyclicity", "passed": true, "violationCount": 0, "details": "No directed cycles detected" },
    { "check": "faithfulness", "passed": true, "violationCount": 2, "details": "2 near-cancelling paths detected" },
    { "check": "causal_sufficiency", "passed": false, "violationCount": 7, "details": "7 bidirectional edges suggest latent confounders" },
    { "check": "instrument_validity", "passed": true, "violationCount": 0, "details": "NIH grant instruments satisfy exclusion restriction" },
    { "check": "positivity", "passed": true, "violationCount": 0, "details": "Propensity scores in [0.04, 0.96]" },
    { "check": "markov_compatibility", "passed": true, "violationCount": 1, "details": "1 d-separation violation" }
  ],
  "totalChecks": 6,
  "passedChecks": 5,
  "sheafCohomologyDim": 3,
  "globalSectionExists": false
}

How much does it cost to use the Knowledge Graph Causal Discovery MCP?

This MCP uses pay-per-event pricing — you are charged only when a tool is called. Platform compute costs are included.

ToolPrice per call10 calls50 calls
discover_causal_structure$0.045$0.45$2.25
compute_interventional_effects$0.050$0.50$2.50
simulate_counterfactuals$0.045$0.45$2.25
extract_causal_claims_literature$0.035$0.35$1.75
embed_causal_knowledge_graph$0.040$0.40$2.00
estimate_causal_effect_tmle$0.050$0.50$2.50
check_graph_consistency$0.035$0.35$1.75
attribute_source_contribution$0.040$0.40$2.00

Full 8-tool pipeline per query: $0.34. Running the complete causal discovery pipeline daily for a month costs approximately $10.

Apify's free plan includes $5 of monthly platform credits, which covers roughly 14 full-pipeline runs at no cost.

You can set a maximum spending limit per session in your Apify account to prevent unexpected charges. The MCP server stops charging and returns an error message if your event limit is reached.

How the Knowledge Graph Causal Discovery MCP works

Phase 1 — parallel data ingestion

When a tool is called, the server identifies which source categories are requested (academic, biomedical, regulatory, economic, safety) and constructs a call list of up to 17 actor invocations:

  • Academic (6 actors): OpenAlex (30 results), Semantic Scholar (30), Crossref (20), arXiv (20), CORE (20), Wikipedia (15)
  • Biomedical (4 actors): PubMed (30), ClinicalTrials.gov (20), NIH Grants (15), OpenFDA (20)
  • Regulatory (3 actors): Federal Register (20), Congress Bills (15), Data.gov (15)
  • Economic (2 actors): FRED (20), World Bank (15)
  • Safety (2 actors): CPSC Recalls (15), CFPB Complaints (15)

All actors run via Promise.all — parallel, not sequential. Each actor has a 180-second timeout. A failed actor returns an empty array rather than failing the entire request, ensuring partial results are always returned.

Phase 2 — causal graph construction

Results from all actors are merged into a typed causal graph (CausalGraph). Nodes are classified by domain signals:

  • Biomedical results containing "trial", "treatment", "therapy", or "drug" → intervention nodes
  • Other biomedical results → outcome nodes
  • Wikipedia articles → confounder nodes (background knowledge)
  • NIH grants → instrument nodes (funding as instrumental variable)
  • Clinical trial records → intervention nodes
  • Regulatory and economic results → confounder and variable nodes respectively

Edges are built from domain heuristics: interventions connect to outcomes with causal weights; confounders connect to both interventions and outcomes; instruments connect to their intervention targets. Variable-to-variable edges are oriented by the additive noise model: HSIC(residual, X) vs HSIC(residual, Y) determines direction.

Phase 3 — algorithm application

The requested algorithm is applied to the constructed graph:

  • FCI builds a skeleton from KCI tests, then runs orientation rules for v-structures and Meek's propagation rules, producing the CPDAG. GES refines via BIC-scored forward/backward/turning phases. The additive noise model resolves remaining unoriented edges via HSIC.
  • Do-calculus applies Rules 1-3 iteratively, testing back-door and front-door criteria against the graph topology. The ID algorithm determines identifiability. Balke-Pearl LP bounds are computed for non-identifiable effects.
  • Twin network duplicates the graph, wires shared exogenous nodes, then propagates structural equations through both copies to compute PN and PS.
  • TMLE initialises Q⁰ via Super Learner, estimates propensity scores with truncation, constructs the clever covariate H, fits epsilon via logistic regression, and computes ATE from the targeted Q*.
  • RotatE initialises entity embeddings, applies unit-modulus rotational updates via self-adversarial negative sampling, and reports MRR and Hits@10.
  • Sheaf cohomology constructs coboundary matrices δ₀ and δ₁ from the graph's incidence structure, computes ker(δ₁)/im(δ₀), and maps violations to specific causal assumption failures.
  • Shapley enumerates all 2^n subsets of the source coalition, computes graph quality for each, and applies the Shapley formula. Nucleolus is found via lexicographic minimax excess optimisation.

Phase 4 — structured response

Results are serialised to JSON and returned via the MCP protocol. Every response includes nodeCount and edgeCount from the constructed graph, plus the algorithm-specific metrics.

Tips for best results

  1. Start with discover_causal_structure before interventional tools. The FCI/GES structure output tells you which adjustment sets are valid for do-calculus. Running compute_interventional_effects without knowing the graph structure risks incorrect confounder adjustment.

  2. Use academic + biomedical sources as your baseline. These two categories trigger 10 actors and cover the densest evidence base. Add regulatory for policy questions, economic for macroeconomic analyses, and safety for product harm or financial misconduct queries.

  3. For counterfactual and legal causation work, use check_graph_consistency first. The sheaf cohomology check confirms whether the graph satisfies causal sufficiency and instrument validity — two assumptions that simulate_counterfactuals relies on for valid PN/PS estimates.

  4. Run attribute_source_contribution with all five sources to get meaningful Shapley values. With fewer than three sources, the coalition game has too few subsets to produce stable marginal contributions. The nucleolus calculation requires at least three active players.

  5. For systematic reviews, extract_causal_claims_literature with academic + biomedical is the most cost-effective entry point at $0.035 per call. Use the returned conflicting claim pairs to identify which relationships need deeper structure discovery or TMLE estimation.

  6. Phrase queries as domain-variable pairs for best graph construction: "smoking lung cancer" rather than "does smoking cause cancer?" The graph builder identifies causal nodes from result titles, and specific entity names produce cleaner node classification.

  7. For rare or niche topics, add academic and include arXiv/CORE by selecting all academic sources — preprint servers often have earlier causal evidence than indexed journals for fast-moving research areas.

  8. Combine tools in a pipeline for full causal analysis: discover_causal_structurecheck_graph_consistencycompute_interventional_effectsestimate_causal_effect_tmle. Total pipeline cost: $0.18 per complete analysis.

Combine with other Apify MCP servers

MCP ServerHow to combine
ryanclinton/market-microstructure-manipulation-mcpFeed causal structure output into market microstructure analysis; Granger causality in that MCP complements Pearl-style do-calculus here
ryanclinton/litigation-intelligence-mcpUse counterfactual PN/PS scores as inputs to pre-litigation risk scoring; necessary causation probability is a key legal standard
ryanclinton/open-source-supply-chain-risk-mcpUse causal structure discovery to identify which OSS dependencies causally propagate vulnerabilities vs. correlate with them
ryanclinton/esg-risk-assessment-mcpCombine regulatory causal graphs with ESG risk scoring to distinguish causal regulatory exposure from correlated industry effects
ryanclinton/drug-pipeline-intelligence-mcpFeed TMLE treatment effect estimates into drug pipeline analysis to supplement trial data with observational causal evidence

Limitations

  • No primary data access. This server analyses published literature, trial registries, and government databases. It does not access raw patient-level data, proprietary biobank records, or paywalled journal content.
  • Graph construction uses heuristic node classification, not ground-truth ontology mapping. Node types (intervention, outcome, confounder) are inferred from title text patterns, which can misclassify ambiguous entities.
  • Causal algorithms operate on the constructed proxy graph, not on the original numeric data. The FCI, GES, TMLE, and other algorithms produce relative estimates calibrated to the graph structure rather than estimates from primary observations.
  • TMLE requires sufficient node density to produce meaningful Super Learner estimates. Queries returning fewer than 20 nodes may produce wide confidence intervals.
  • RotatE embeddings are initialised fresh per call — there is no persistent knowledge graph that improves over time with repeated queries. Embedding quality scales with node count; sparse graphs produce lower MRR.
  • Sheaf cohomology results are sensitive to bidirectional edge prevalence. Graphs with many hidden-confounder edges (common in observational literature) will show positive H¹ dimension even for well-studied domains.
  • Source availability is not guaranteed. All 17 upstream actors call live public APIs. Outages, rate limiting, or temporary API changes at any source return empty arrays rather than errors, which reduces graph density but does not fail the request.
  • Regulatory and economic sources are US-centric. The Federal Register, Congress Bills, FRED, and CPSC cover US institutions. For international regulatory causal analysis, rely on academic and biomedical sources which have global coverage.

Integrations

  • Apify API — call the MCP server programmatically from Python, JavaScript, or any HTTP client using the Apify Actor API
  • Webhooks — trigger downstream workflows (Slack alerts, database writes, report generation) when a causal analysis completes
  • Zapier — connect causal discovery results to Google Sheets, HubSpot, Notion, or any of Zapier's 6,000+ apps without code
  • Make — build multi-step automation scenarios that chain causal discovery with data enrichment, notifications, and CRM updates
  • LangChain / LlamaIndex — use the MCP server as a causal reasoning tool within RAG pipelines and autonomous agent frameworks

❓ FAQ

How many data sources does a single causal discovery query touch? Up to 17 actors run in parallel depending on which source categories you select. The academic category triggers 6 actors (OpenAlex, Semantic Scholar, Crossref, arXiv, CORE, Wikipedia). biomedical triggers 4 (PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA). regulatory triggers 3, economic 2, and safety 2. Selecting all five categories runs all 17 actors simultaneously.

How is this different from a standard literature review tool or RAG pipeline? Standard literature review tools return ranked documents. This server constructs a typed causal graph from those documents and applies formal causal inference algorithms — FCI, do-calculus, twin networks, TMLE — to extract directional causal relationships, not just associations. The output is mathematical causal structure, not retrieved text.

How fresh is the data returned? All data is fetched live at query time from each source API. There is no cached index. Results reflect the current state of OpenAlex, PubMed, FRED, and the other databases at the moment of the call.

Can I use only one or two source categories to reduce cost? Yes. Every tool accepts a sources array with any combination of academic, biomedical, regulatory, economic, and safety. Using only academic + biomedical is sufficient for most research questions and is the default for all tools except attribute_source_contribution.

What does a Shapley value of 0.4 for biomedical sources mean? It means biomedical data sources (PubMed, ClinicalTrials.gov, NIH Grants, OpenFDA) contribute 40% of the total causal graph quality, measured as the average marginal contribution of that data domain across all possible subsets of the five source categories.

Is it legal to use the data from these sources? All 17 sources are publicly available APIs and open government databases. PubMed, ClinicalTrials.gov, FDA, FRED, World Bank, and the others are free public resources. See Apify's guide on web scraping legality.

Can this replace a randomised controlled trial? No. TMLE and do-calculus provide observational causal inference, which relies on assumptions (no unmeasured confounding, positivity, consistency) that are untestable from data alone. The tools identify causal hypotheses and estimate effect sizes from observational evidence — they do not generate experimental evidence. The check_graph_consistency tool explicitly flags violations of causal sufficiency and other key assumptions.

How long does a typical tool call take? Most tool calls complete in 20–60 seconds. Time depends on source category selection — academic + biomedical (10 actors) typically takes 25–45 seconds; all five categories (17 actors) may take 45–90 seconds. Actor timeouts are set to 180 seconds per source.

Can I use this with a custom MCP client or agent framework? Yes. The server implements the standard MCP protocol at /mcp. Any client that supports MCP — including Cursor, Windsurf, Cline, custom Python MCP clients, or LangChain agent frameworks — can connect to https://knowledge-graph-causal-discovery-mcp.apify.actor/mcp.

What happens if a data source is temporarily unavailable? Individual actor failures return empty arrays rather than propagating errors. The graph is built from available sources, and the causal algorithm runs on the reduced graph. The response always includes nodeCount and edgeCount so you can verify graph density and re-run with different sources if needed.

Can I run structure discovery and TMLE estimation on the same query to cross-validate results? Yes, and this is the recommended workflow for high-stakes analyses. discover_causal_structure identifies the graph topology and adjustment sets. estimate_causal_effect_tmle uses that topology to select valid confounders for the Super Learner and propensity model. Running both costs $0.095.

Does the server support streaming responses for long-running queries? The server uses the Streamable HTTP transport from the MCP SDK, which supports streaming. MCP clients that implement streaming (including Claude Desktop) will receive incremental updates during long-running actor calls.

Help us improve

If you encounter unexpected results or errors, enable run sharing so we can diagnose issues faster:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong. Your data is visible only to the actor developer, not publicly.

Support

Found a bug or need a feature? Open an issue in the Issues tab on this actor's page. For custom causal inference configurations, domain-specific ontology integration, or enterprise deployments, reach out through the Apify platform.

How it works

01

Configure

Set your parameters in the Apify Console or pass them via API.

02

Run

Click Start, trigger via API, webhook, or set up a schedule.

03

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Ready to try Knowledge Graph Causal Discovery MCP?

Start for free on Apify. No credit card required.

Open on Apify Store