AIDEVELOPER TOOLS

Autonomous Cyber Red Team MCP

Autonomous cyber red team intelligence for AI agents — this MCP server gives Claude, GPT-4o, and any MCP-compatible agent the ability to run quantitative attack simulations, synthesize exploit chains, forecast vulnerability emergence, and model adversary behavior using eight rigorously implemented mathematical frameworks. It is built for security engineers, threat analysts, and AI agents that need structured, reproducible cyber risk intelligence from a single tool call.

Try on Apify Store
$0.12per event
0
Users (30d)
0
Runs (30d)
90
Actively maintained
Maintenance Pulse
$0.12
Per event

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

simulate-attack-defense-posgs
Estimated cost:$12.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
simulate-attack-defense-posgPOSG belief-space HSVI2 planning$0.12
synthesize-exploit-chainsAND-OR A* with CVSS heuristic$0.10
predict-vulnerability-emergenceHawkes power-law vulnerability clustering$0.08
optimize-defender-allocationColonel Blotto Nash equilibrium$0.08
model-adaptive-adversaryExp3 adversarial multi-armed bandit$0.08
compute-lateral-movement-riskAbsorbing Markov chain fundamental matrix$0.08
assess-zero-day-tail-riskGPD extreme value tail modeling$0.08
forecast-threat-landscape-evolutionReplicator dynamics evolutionary game$0.10

Example: 100 events = $12.00 · 1,000 events = $120.00

Connect to your AI agent

Add this MCP server to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

MCP Endpoint
https://ryanclinton--autonomous-cyber-red-team-mcp.apify.actor/mcp
Claude Desktop Config
{
  "mcpServers": {
    "autonomous-cyber-red-team-mcp": {
      "url": "https://ryanclinton--autonomous-cyber-red-team-mcp.apify.actor/mcp"
    }
  }
}

Documentation

Autonomous cyber red team intelligence for AI agents — this MCP server gives Claude, GPT-4o, and any MCP-compatible agent the ability to run quantitative attack simulations, synthesize exploit chains, forecast vulnerability emergence, and model adversary behavior using eight rigorously implemented mathematical frameworks. It is built for security engineers, threat analysts, and AI agents that need structured, reproducible cyber risk intelligence from a single tool call.

The server aggregates live data from 15 cybersecurity sources in parallel — NVD, CISA KEV, Censys, DNS, SSL transparency logs, WHOIS, GitHub, StackExchange, Hacker News, OFAC, OpenSanctions, IP geolocation, tech stack detection, website monitoring, and FRED economic data — assembles a weighted attack graph, then applies algorithms from game theory, stochastic processes, and extreme value theory to produce quantitative outputs that go well beyond keyword searches or CVSS score lookups.

What data can you access?

Data PointSourceExample
📋 CVE records with CVSS base scores and CWE classificationsNVD CVE SearchCVE-2021-44228, CVSS 10.0 (Log4Shell)
🚨 Known exploited vulnerabilities with CISA remediation deadlinesCISA KEV Catalog1,000+ actively exploited CVEs
🌐 Exposed hosts, open ports, and service bannersCensys Search443/tcp nginx 1.18.0, cert CN=*.target.com
🔍 DNS record enumeration (A, AAAA, MX, TXT, CNAME, NS)DNS Lookupmail.acmecorp.com → 203.0.113.42
🔒 SSL/TLS certificates and transparency log entriesSSL Certificate Search*.acmecorp.com, SAN=admin.acmecorp.com
📝 Domain registration data, registrar, and expiryWHOIS LookupRegistered 2008, expires 2026, GoDaddy
📍 IP-to-ASN mapping with country and ISPIP GeolocationAS14618 Amazon, us-east-1
🛠️ Technology stack identified from HTTP headers and HTMLTech Stack DetectorApache 2.4.51, PHP 7.4, WordPress 5.8
💻 Proof-of-concept exploit repositories and security toolsGitHub Repo SearchCVE-2021-44228 POC, 3.2K stars
💬 Security community Q&A and technique discussionsStackExchange (Security.SE)"How does Pass-the-Hash work on AD?"
📰 Vulnerability disclosures and security newsHacker News Search"New 0-day in OpenSSL—patch now"
🏛️ US Treasury OFAC Specially Designated NationalsOFAC Sanctions SearchThreat actor entity screening
🌍 Multi-jurisdiction sanctions and watchlistsOpenSanctions Search100+ programs, EU, UN, Interpol
📡 Security advisory and policy page changesWebsite Change MonitorCISA advisory page deltas
💰 Cybersecurity market spend and cyber insurance indicesFRED Economic DataUS cybersecurity GDP component

MCP tools

ToolPriceAlgorithmBest for
simulate_attack_defense_posg$0.045POSG via HSVI2 point-based value iteration, alpha-vector pruningOptimal attacker/defender strategy, game value computation
synthesize_exploit_chains$0.040AND-OR graph A* with CVSS admissible heuristic h(n) = max CVSS on pathMulti-step attack path discovery, remediation prioritization
predict_vulnerability_emergence$0.035Hawkes self-exciting process with power-law kernel, thinning simulation30/90-day CVE forecasting, patch cycle planning
optimize_defender_allocation$0.040Colonel Blotto game, Nash equilibrium via fictitious playSecurity budget allocation, dominated-strategy elimination
model_adaptive_adversary$0.035Exp3 multi-armed bandit, importance-weighted reward updatesAPT technique prediction, proactive defense planning
compute_lateral_movement_risk$0.035Absorbing Markov chain, fundamental matrix N=(I-Q)^-1, epidemic thresholdNetwork propagation risk, segmentation validation
assess_zero_day_tail_risk$0.035GPD extreme value theory, probability-weighted momentsVaR/CVaR quantification, cyber insurance pricing
forecast_threat_landscape_evolution$0.030Replicator dynamics dx_i/dt = x_i(f_i − φ), ESS identificationLong-term threat strategy, EMERGING/DECLINING classification

Why use the Autonomous Cyber Red Team MCP?

Security teams running manual threat assessments spend hours correlating CVE feeds, mapping attack paths on whiteboards, and guessing at budget allocation. Consultancies charge $15,000–50,000 for a red team engagement that covers a fraction of the attack surface and produces results that are stale within weeks.

This MCP server automates the quantitative layer of that work. An AI agent calls a single tool, the server fans out across 15 data sources in parallel, builds a live attack graph, and returns mathematically grounded outputs — game values, probability distributions, expected steps to compromise, and tail risk estimates — within 60–180 seconds.

Benefits of running on the Apify platform:

  • Scheduling — run weekly threat landscape forecasts on a cron schedule to track how the risk posture changes over time
  • API access — integrate tool outputs directly into SIEM, SOAR, or custom dashboards via the Apify API
  • Spending limits — set a maximum credit spend per session so AI agents cannot exceed a cost budget
  • Webhooks — trigger alerts to Slack or PagerDuty when vulnerability burst probability exceeds a threshold
  • No infrastructure — the MCP server runs in Apify standby mode with zero self-hosted infrastructure

Features

  • 8 distinct mathematical frameworks — POSG, AND-OR A*, Hawkes process, Colonel Blotto, Exp3 bandit, absorbing Markov chain, GPD extreme value theory, and replicator dynamics — each implemented from first principles in TypeScript
  • 15 live data sources queried in parallel — every tool call fans out to NVD, CISA KEV, Censys, DNS, SSL, WHOIS, IP geolocation, tech stack, GitHub, StackExchange, Hacker News, OFAC, OpenSanctions, website monitor, and FRED simultaneously
  • Attack graph construction from heterogeneous inputsbuildAttackGraph() normalizes all 15 source outputs into a unified weighted directed graph with VulnNode and AttackEdge types, CVSS-weighted edges, AND/OR prerequisite flags, and technique labels
  • HSVI2 belief-space planningsimulateAttackDefensePOSG() solves the NEXPTIME-complete POSG on the belief simplex using point-based value iteration with alpha-vector pruning, returning converged game values and belief-state strategies
  • CVSS-admissible A heuristic* — synthesizeExploitChains() guarantees optimal-cost attack paths via h(n) = max CVSS score on remaining path; AND nodes require all prerequisites satisfied, OR nodes need any single one
  • Power-law Hawkes kernelpredictVulnerabilityEmergence() uses λ(t) = μ + Σ_i α(1 + (t−t_i)/c)^(−(1+ω)) to capture long-memory CVE clustering; thinning simulation forecasts 30-day and 90-day counts
  • Fictitious play Nash equilibriumoptimizeDefenderAllocation() runs the Colonel Blotto game on security domain battlefields; each iteration best-responds to the empirical opponent distribution until convergence
  • Exp3 importance-weighted updatesmodelAdaptiveAdversary() applies w_i(t+1) = w_i(t) × exp(η × r̂_i / K) with optimal η = √(2 ln K / T) to track which attack techniques an adversary will favor
  • Gauss-Jordan fundamental matrixcomputeLateralMovementRisk() inverts (I−Q) using full pivoting to compute N = (I−Q)^-1, giving expected steps from every transient state to absorption; epidemic threshold β_c = ⟨k⟩/⟨k²⟩ determines supercritical spread
  • Probability-weighted moments GPD fittingassessZeroDayTailRisk() fits a Generalized Pareto Distribution to CVSS exceedances above threshold, returning VaR(95%), VaR(99%), CVaR(95%), CVaR(99%), and return periods
  • Replicator dynamics with ESS detectionforecastThreatLandscapeEvolution() integrates dx_i/dt = x_i(f_i − φ̄) and classifies each technique as EMERGING, GROWING, MATURE, or DECLINING based on growth rate and fitness; evolutionary stable strategies are identified by invasion resistance
  • Seeded PRNG for reproducibility — mulberry32 PRNG with query-derived seed ensures deterministic outputs for the same input across runs
  • Spend limit enforcement — every tool calls Actor.charge() and checks eventChargeLimitReached before executing, respecting per-session budget caps set by the caller

Use cases for autonomous cyber red team intelligence

Penetration testing preparation

Security engineers preparing for a red team engagement use synthesize_exploit_chains to map the entire AND-OR attack graph for a target environment before the test begins. The A* search surfaces multi-step paths that human testers might miss — for example, a chain from an exposed legacy SSH service through a misconfigured jump host into a domain controller — with total CVSS cost and estimated execution time for each path.

Security budget allocation and CISO reporting

CISOs and security directors use optimize_defender_allocation to translate threat data into resource allocation recommendations. The Colonel Blotto output shows which security domains (perimeter, identity, endpoint, cloud, network) are under-defended relative to the Nash equilibrium, and identifies dominated strategies where current spending provides no marginal protection.

Vulnerability management and patch prioritization

Vulnerability management teams use predict_vulnerability_emergence to forecast CVE disclosure rates for specific technologies — Linux kernel, Apache, Windows Active Directory — over 30 and 90-day horizons. Hawkes process burst probability identifies when a clustering event is likely, enabling teams to pre-position remediation resources before a wave of disclosures.

Cyber insurance underwriting and actuarial pricing

Underwriters and actuaries use assess_zero_day_tail_risk to quantify the severity distribution of vulnerabilities in a policyholder's technology stack. GPD-fitted VaR and CVaR metrics provide a statistically grounded basis for maximum loss estimates and premium calculations, replacing rule-of-thumb cyber risk scoring.

APT tracking and threat intelligence

Threat intelligence analysts use model_adaptive_adversary to model how known APT groups — APT29, Lazarus, Sandworm — adapt their technique selection against specific defensive postures. The Exp3 bandit output predicts which MITRE ATT&CK techniques will be prioritized in future campaigns based on historical adaptation speed and estimated reward.

Strategic security planning

Security architects and long-range planners use forecast_threat_landscape_evolution to understand which attack techniques are gaining evolutionary fitness and which are declining. Replicator dynamics identifies evolutionary stable strategies — techniques that, once dominant, resist displacement by alternatives — enabling investment in defenses against tomorrow's threat landscape, not last year's.

How to connect this MCP server

Step 1 — Get your Apify API token

Sign up at apify.com and copy your API token from Account Settings. The free plan includes $5 of monthly credits.

Step 2 — Add to your MCP client

Claude Desktop — add to claude_desktop_config.json:

{
  "mcpServers": {
    "autonomous-cyber-red-team": {
      "url": "https://autonomous-cyber-red-team-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Cursor / Windsurf / Cline — add the same URL and Authorization header in your MCP server settings.

Step 3 — Call a tool

Ask your agent: "Synthesize exploit chains for apache log4j corporate network" or "Model an APT29 adaptive adversary targeting financial services." The agent calls the tool, the server aggregates 15 data sources, and returns structured JSON with the quantitative analysis.

Step 4 — Interpret the results

Each tool returns a JSON object with algorithm-specific fields alongside a graphSummary showing how many nodes and edges were constructed from the live data. Higher node and edge counts indicate richer signal from the data sources.

Tool parameters

All 8 tools share the same two input parameters:

ParameterTypeRequiredDefaultDescription
querystringYesTarget domain, technology, threat actor, or scenario (e.g. "apache log4j corporate network", "APT29 nation-state attack", "ransomware healthcare sector")
maxResultsnumberNo30–50 (varies by tool)Maximum results to fetch per actor. Range: 5–100. Higher values improve model quality but increase response time. Tools 3 and 7 default to 50 for better statistical estimation.

Input examples

Exploit chain synthesis for a specific CVE:

{
  "query": "apache log4j CVE-2021-44228 corporate network",
  "maxResults": 30
}

APT adversary modeling with high data volume:

{
  "query": "APT29 nation-state attack financial sector",
  "maxResults": 50
}

Zero-day tail risk for a technology portfolio:

{
  "query": "linux kernel remote code execution",
  "maxResults": 50
}

Rapid threat landscape forecast:

{
  "query": "ransomware healthcare sector",
  "maxResults": 20
}

Input tips

  • Be specific in your query — "windows active directory pass-the-hash" produces a more targeted attack graph than "windows security." The query is passed to all 15 data sources, so domain-specific terminology extracts more relevant signal.
  • Increase maxResults for statistical toolspredict_vulnerability_emergence and assess_zero_day_tail_risk rely on sample size for distribution fitting. Use maxResults ≥ 50 for those two tools.
  • Use technology names with version context — "Apache HTTP Server 2.4" returns more precise CVE and Censys data than "web server."
  • Combine tools for full assessments — running all 8 tools on the same query builds a complete picture: exploit paths, defender allocation, adversary adaptation, and tail risk for a total of $0.295 per target.

Output examples

simulate_attack_defense_posg output

{
  "gameValue": 0.623,
  "attackerValue": 0.741,
  "defenderValue": 0.259,
  "converged": true,
  "iterations": 847,
  "alphaVectorCount": 23,
  "optimalAttackPath": [
    "CVE-2021-44228-entry",
    "corp-jumphost-pivot",
    "ad-domain-controller-target"
  ],
  "optimalDefenseAllocation": {
    "perimeter": 0.38,
    "identity": 0.31,
    "endpoint": 0.18,
    "cloud": 0.13
  },
  "beliefStates": [
    { "state": "undetected", "probability": 0.67 },
    { "state": "partial-detection", "probability": 0.24 },
    { "state": "full-detection", "probability": 0.09 }
  ],
  "graphSummary": { "nodes": 47, "edges": 83 }
}

synthesize_exploit_chains output

{
  "attackSurfaceScore": 8.14,
  "chainCount": 12,
  "criticalPath": {
    "path": ["log4j-rce-entry", "lateral-smb-pivot", "lsass-dump-target", "dc-sync-exfil"],
    "totalCvss": 34.7,
    "probability": 0.43,
    "andNodes": ["lsass-dump-target"],
    "orNodes": ["log4j-rce-entry", "lateral-smb-pivot"],
    "techniques": ["T1190", "T1021.002", "T1003.001", "T1003.006"],
    "estimatedTime": 4.2
  },
  "averageChainLength": 3.8,
  "maxCvssChain": 34.7,
  "chains": [
    {
      "path": ["log4j-rce-entry", "lateral-smb-pivot", "lsass-dump-target", "dc-sync-exfil"],
      "totalCvss": 34.7,
      "probability": 0.43,
      "techniques": ["T1190", "T1021.002", "T1003.001", "T1003.006"],
      "estimatedTime": 4.2
    }
  ],
  "graphSummary": { "nodes": 47, "edges": 83 }
}

predict_vulnerability_emergence output

{
  "baselineIntensity": 0.032,
  "currentIntensity": 0.187,
  "hawkesParams": {
    "alpha": 0.71,
    "omega": 1.23,
    "c": 0.45
  },
  "predictedEvents30d": 8,
  "predictedEvents90d": 21,
  "burstProbability": 0.68,
  "criticalVulnForecast": 3,
  "clusterSizes": [2, 4, 3, 5, 2],
  "intensityTimeline": [
    { "t": 0, "lambda": 0.187 },
    { "t": 7, "lambda": 0.143 },
    { "t": 14, "lambda": 0.112 },
    { "t": 30, "lambda": 0.089 }
  ],
  "graphSummary": { "nodes": 61, "edges": 104 }
}

assess_zero_day_tail_risk output

{
  "gpdParameters": {
    "shape": 0.14,
    "scale": 1.82,
    "threshold": 7.0
  },
  "var95": 8.6,
  "var99": 9.4,
  "cvar95": 9.1,
  "cvar99": 9.7,
  "maxObserved": 10.0,
  "tailIndex": 0.14,
  "portfolioRisk": 0.83,
  "exceedanceProbabilities": [
    { "severity": 8.0, "probability": 0.31 },
    { "severity": 9.0, "probability": 0.12 },
    { "severity": 9.5, "probability": 0.04 }
  ],
  "returnPeriods": [
    { "severity": 9.0, "returnPeriodDays": 28 },
    { "severity": 9.5, "returnPeriodDays": 84 }
  ],
  "graphSummary": { "nodes": 58, "edges": 96 }
}

Output fields by tool

simulate_attack_defense_posg

FieldTypeDescription
gameValuenumberNash equilibrium game value (0–1), higher = attacker advantage
attackerValuenumberAttacker's expected utility under optimal strategy
defenderValuenumberDefender's expected utility under optimal strategy
convergedbooleanWhether HSVI2 iteration converged within tolerance
iterationsnumberNumber of point-based value iteration cycles
alphaVectorCountnumberAlpha vectors maintained after pruning
optimalAttackPathstring[]Node IDs on the optimal attack path
optimalDefenseAllocationobjectNormalized budget allocation per security domain
beliefStatesarrayDetection state probabilities under optimal play
graphSummary.nodesnumberTotal nodes in the constructed attack graph
graphSummary.edgesnumberTotal edges in the constructed attack graph

synthesize_exploit_chains

FieldTypeDescription
attackSurfaceScorenumberAggregate attack surface severity score
chainCountnumberTotal exploit chains discovered
criticalPathobjectLowest-cost (highest-CVSS) exploit chain
criticalPath.pathstring[]Node IDs from entry point to target
criticalPath.totalCvssnumberSum of CVSS scores along the chain
criticalPath.probabilitynumberEstimated success probability
criticalPath.andNodesstring[]Nodes requiring all prerequisites (AND logic)
criticalPath.orNodesstring[]Nodes requiring any prerequisite (OR logic)
criticalPath.techniquesstring[]MITRE ATT&CK technique IDs
criticalPath.estimatedTimenumberEstimated attack execution time (hours)
averageChainLengthnumberMean number of steps across all chains
maxCvssChainnumberMaximum cumulative CVSS across all chains
chainsarrayUp to 10 exploit chains, ordered by CVSS cost

predict_vulnerability_emergence

FieldTypeDescription
baselineIntensitynumberBackground CVE arrival rate μ
currentIntensitynumberCurrent intensity λ(t) including excitation
hawkesParams.alphanumberExcitation magnitude parameter
hawkesParams.omeganumberPower-law decay exponent
hawkesParams.cnumberPower-law offset parameter
predictedEvents30dnumberExpected CVE count in next 30 days
predictedEvents90dnumberExpected CVE count in next 90 days
burstProbabilitynumberProbability of a clustering burst event
criticalVulnForecastnumberForecast critical (CVSS ≥ 9.0) CVE count
clusterSizesnumber[]Observed historical cluster sizes
intensityTimelinearrayIntensity decay curve over time

optimize_defender_allocation

FieldTypeDescription
defenderBudgetnumberDefender resource units
attackerBudgetnumberAttacker resource units
battlefieldsarrayPer-domain allocation and win probabilities
battlefields[].namestringSecurity domain name
battlefields[].defenderAllocnumberDefender units allocated
battlefields[].attackerAllocnumberAttacker units allocated
battlefields[].defenderWinProbnumberDefender win probability on this battlefield
nashEquilibriumobjectNash equilibrium allocation by domain
defenderExpectedPayoffnumberDefender expected payoff at Nash
attackerExpectedPayoffnumberAttacker expected payoff at Nash
dominatedStrategiesstring[]Strategies eliminated by iterated dominance
iterationsnumberFictitious play convergence iterations

model_adaptive_adversary

FieldTypeDescription
optimalArmstringAttack technique the adversary has converged on
predictedNextActionstringPredicted next attack technique
adaptationSpeednumberRate of weight concentration across rounds
regretnumberCumulative regret vs. best fixed strategy
explorationRatenumberCurrent exploration probability
roundsnumberExp3 simulation rounds
armsarrayPer-technique weight, probability, reward, pull count

compute_lateral_movement_risk

FieldTypeDescription
epidemicThresholdnumberβ_c = ⟨k⟩/⟨k²⟩ critical transmission threshold
currentBetanumberEstimated current transmission rate
supercriticalbooleanTrue if currentBeta > epidemicThreshold (compromise spreads)
meanDegreenumberMean network degree ⟨k⟩
meanSquareDegreenumberSecond moment ⟨k²⟩
expectedStepsobjectExpected steps to compromise per starting node
absorptionProbabilitiesobjectProbability of reaching each target from each source
highRiskPathsarrayTop 10 paths by expected-steps-to-compromise
fundamentalMatrixSizenumberDimension of N = (I-Q)^-1

assess_zero_day_tail_risk

FieldTypeDescription
gpdParameters.shapenumberGPD shape parameter ξ (tail heaviness)
gpdParameters.scalenumberGPD scale parameter σ
gpdParameters.thresholdnumberExceedance threshold u
var95numberCVSS Value at Risk at 95th percentile
var99numberCVSS Value at Risk at 99th percentile
cvar95numberConditional VaR (Expected Shortfall) at 95%
cvar99numberConditional VaR (Expected Shortfall) at 99%
maxObservednumberMaximum CVSS score observed in the dataset
tailIndexnumberTail heaviness index (positive = heavy tail)
portfolioRisknumberAggregate portfolio cyber risk score 0–1
exceedanceProbabilitiesarrayP(X > x) at severity thresholds
returnPeriodsarrayExpected days between severity exceedances

forecast_threat_landscape_evolution

FieldTypeDescription
essstring[]Evolutionary stable strategies (invasion-resistant techniques)
emergingThreatsstring[]Techniques classified as EMERGING
decliningThreatsstring[]Techniques classified as DECLINING
replicatorConvergencenumberConvergence measure of replicator dynamics (0–1)
timeToEquilibriumnumberEstimated time steps to stable distribution
landscapeEntropynumberShannon entropy of technique frequency distribution
techniquesarrayPer-technique name, frequency, fitness, phase, growthRate

How much does autonomous red team intelligence cost?

Each tool uses pay-per-event pricing — you pay only when a tool call is successfully initiated. Prices vary by computational complexity.

ToolPrice per call10 calls50 calls
simulate_attack_defense_posg$0.045$0.45$2.25
synthesize_exploit_chains$0.040$0.40$2.00
predict_vulnerability_emergence$0.035$0.35$1.75
optimize_defender_allocation$0.040$0.40$2.00
model_adaptive_adversary$0.035$0.35$1.75
compute_lateral_movement_risk$0.035$0.35$1.75
assess_zero_day_tail_risk$0.035$0.35$1.75
forecast_threat_landscape_evolution$0.030$0.30$1.50
Full 8-tool assessment (one target)$0.295

You can set a maximum spending limit per session in your MCP client or via Apify's built-in spend controls. The server checks eventChargeLimitReached before executing each tool and returns a clean error if the budget is reached.

Apify's free plan includes $5 of monthly credits — enough for 16 full 8-tool assessments at no cost.

Compare this to commercial threat intelligence platforms like Recorded Future or Mandiant Advantage, which charge $30,000–150,000 per year for analyst-curated feeds without the quantitative modeling layer.

How the Autonomous Cyber Red Team MCP works

Phase 1 — Data aggregation (parallel, 60–150 seconds)

Every tool call dispatches up to 15 actor calls in parallel via runActorsParallel() in actor-client.ts. Each actor is called with the user's query and a per-actor maxResults cap. The 180-second actor timeout ensures the parallel fan-out completes before the attack graph is assembled. Failures in individual data sources are gracefully handled — that source returns an empty array and the graph is built from the remaining signals.

The 15 sources are selected to cover the full kill chain: NVD and CISA KEV provide vulnerability nodes, Censys and tech stack detection provide asset nodes, DNS and SSL expose the network perimeter, GitHub surfaces active exploit code, StackExchange and Hacker News provide context on exploitation technique feasibility, OFAC and OpenSanctions handle threat actor attribution, and FRED provides economic context for risk quantification.

Phase 2 — Attack graph construction

buildAttackGraph() in scoring.ts normalizes all 15 source outputs into a unified AttackGraph with typed VulnNode and AttackEdge structures. Nodes carry CVSS score, exploitability, impact, KEV flag, tech stack label, category (entry/pivot/target/exfil), and detection difficulty. Edges carry technique name, success probability, CVSS weight, and AND/OR prerequisite type. A seeded mulberry32 PRNG (seed=42) ensures deterministic graph construction for identical inputs. The resulting weighted adjacency matrix drives all downstream algorithms.

Phase 3 — Algorithm application

Each tool applies a distinct algorithm to the same attack graph:

  • POSG (Tool 1) — HSVI2 iterates over belief-space points, computing upper and lower bounds on value until the gap is below tolerance. Alpha vectors are pruned at each iteration using domination checking. The converged values represent the Nash equilibrium of the infinite-horizon discounted game.
  • AND-OR A (Tool 2)* — A priority queue explores the AND-OR graph with f(n) = g(n) + h(n), where g is cumulative CVSS cost and h = max remaining CVSS serves as an admissible heuristic. AND nodes are expanded only when all prerequisites are satisfied; OR nodes expand on any single satisfied prerequisite.
  • Hawkes process (Tool 3) — Power-law kernel λ(t) = μ + Σ_i α(1 + (t−t_i)/c)^(−(1+ω)) is calibrated from CVE timestamp data. Thinning simulation generates sample paths for the 30-day and 90-day forecasts.
  • Colonel Blotto (Tool 4) — Fictitious play best-responds to the current empirical opponent distribution across security battlefields each iteration until the mixed strategies converge to Nash equilibrium.
  • Exp3 (Tool 5) — Exponential-weight update w_i(t+1) = w_i(t) × exp(η × r̂_i / K) with importance-weighted estimator r̂_i = r_i / p_i tracks adversary technique selection. Optimal learning rate η = √(2 ln K / T) is computed analytically.
  • Absorbing Markov (Tool 6) — Gauss-Jordan inversion computes N = (I−Q)^-1 where Q is the transient-to-transient sub-matrix. Absorption matrix B = NR gives compromise probabilities. Epidemic threshold β_c = ⟨k⟩/⟨k²⟩ from mean-field theory determines supercritical spread.
  • GPD (Tool 7) — Probability-weighted moments fit GPD parameters (shape ξ, scale σ) to CVSS exceedances above threshold u. P(X > x | X > u) = (1 + ξ(x−u)/σ)^(−1/ξ) gives exceedance probabilities for VaR and CVaR computation.
  • Replicator dynamics (Tool 8) — Euler integration of dx_i/dt = x_i(f_i − φ̄) where φ̄ = Σ_j x_j f_j is mean fitness. Techniques are classified by growth rate sign and fitness percentile. ESS candidates are identified by perturbation stability.

Phase 4 — Response serialization

Results are serialized to JSON via the json() helper and returned as MCP CallToolResult with content[0].type = 'text'. All numeric fields are rounded for readability. The graphSummary field is appended to every response so callers can assess data richness.

Tips for best results

  1. Query with technology plus context — "apache log4j 2.14 corporate internal network" builds a richer graph than "log4j." The query is passed verbatim to all 15 actors, so specificity multiplies across sources.

  2. Use predict_vulnerability_emergence proactively — run it weekly on your highest-risk technology components. When burstProbability exceeds 0.6, pre-position your patch team. A weekly run costs $0.035 and can prevent an emergency response.

  3. Sequence POSG before Blotto for budget discussionssimulate_attack_defense_posg gives you the game value (attacker advantage), then optimize_defender_allocation tells you how to redistribute budget to counter it. Present both outputs together for CISO-level conversations.

  4. Set maxResults ≥ 50 for tail risk and Hawkes tools — GPD fitting and Hawkes calibration improve substantially with more samples. For assess_zero_day_tail_risk and predict_vulnerability_emergence, always use maxResults: 50 or higher.

  5. Run forecast_threat_landscape_evolution quarterly — replicator dynamics show technique lifecycle stages. Techniques classified as EMERGING now will be GROWING in 6–12 months. Quarterly forecasts give you 2–3 quarters of lead time to build defenses against the next dominant technique.

  6. Combine with WHOIS Domain Lookup and Website Tech Stack Detector — these actors are embedded in the MCP's data aggregation, but running them standalone first lets you validate that the target is correctly identified before a full 8-tool assessment.

  7. Export results to a dataset for trending — use Apify's dataset storage to persist tool outputs over time. Comparing gameValue from simulate_attack_defense_posg month-over-month gives an objective measure of how your attack surface is changing.

Combine with other Apify actors

ActorHow to combine
Website Tech Stack DetectorDetect technology stack first, then pass identified technologies as queries to synthesize_exploit_chains for targeted exploit chain analysis
WHOIS Domain LookupEnumerate subdomains and registration data, then feed domain list into simulate_attack_defense_posg for perimeter attack surface modeling
Company Deep ResearchGenerate a company intelligence report, extract technology mentions, then run predict_vulnerability_emergence on each identified technology
Website Change MonitorMonitor CISA advisory and vendor security pages; trigger synthesize_exploit_chains via webhook when a new advisory is detected
Cyber Attack Surface ReportRun a full passive attack surface report, then feed the output into compute_lateral_movement_risk for network propagation analysis
B2B Lead QualifierScreen prospects by technology stack, then use assess_zero_day_tail_risk on their stack to identify high-risk leads for cybersecurity products
Company Due Diligence ReportEnrich a due diligence report with assess_zero_day_tail_risk and forecast_threat_landscape_evolution for acquisition cyber risk sections

Limitations

  • Passive analysis only — this server never sends packets to target infrastructure. It accesses only publicly available data from government registries, certificate transparency logs, DNS, and community databases. It cannot discover vulnerabilities that are not publicly documented.
  • Attack graph quality depends on data source coverage — queries for niche or proprietary technologies may return sparse NVD and GitHub data, resulting in smaller attack graphs and less statistically reliable model outputs. Check graphSummary.nodes — graphs with fewer than 10 nodes should be interpreted cautiously.
  • Mathematical models are probabilistic, not deterministic — POSG game values, Hawkes forecasts, and GPD tail estimates are statistical outputs grounded in the available data, not ground-truth assessments. They quantify risk, they do not certify it.
  • No runtime exploit execution — this is not an automated exploitation framework. It identifies and prioritizes attack paths; executing those paths requires a human red team.
  • 15 data sources have individual rate limits — during high-traffic periods, individual actors (especially Censys and GitHub) may return fewer results than maxResults due to upstream rate limiting. This is handled gracefully but may reduce graph density.
  • Hawkes calibration requires historical CVE timestamps — if NVD returns few results for a niche query, the Hawkes model will have limited historical data to calibrate against and will fall back to baseline intensity estimates.
  • GPD fitting requires exceedances above threshold — if fewer than 5 CVSS scores exceed the threshold (default u = 7.0), the GPD shape and scale estimates will be unreliable. maxResults: 50 mitigates this.
  • No authentication bypass or credential testing — the server does not attempt to log in to systems, test default credentials, or execute authenticated scans.

Integrations

  • Apify API — call individual tools directly from Python or Node.js applications, or integrate MCP responses into SIEM/SOAR pipelines via REST
  • Webhooks — trigger downstream actions (PagerDuty alert, Jira ticket, Slack notification) when burstProbability or gameValue exceeds a threshold
  • Zapier — schedule weekly threat landscape forecasts and push results to Google Sheets or a Notion database for tracking over time
  • Make — build multi-step automation: website change detected → run exploit chain synthesis → push findings to security ticketing system
  • Google Sheets — export weekly VaR, CVaR, and game value metrics to a spreadsheet for board-level cyber risk reporting
  • LangChain / LlamaIndex — use MCP tool outputs as structured context in RAG pipelines for AI-assisted threat report generation

Troubleshooting

Empty or very small attack graph (graphSummary.nodes < 5) — the query returned few results from NVD and related sources. Try a broader query: use the technology name without version number, or add the word "vulnerability." Also check that maxResults is at least 20. For very niche technologies, the attack graph will be sparse regardless of maxResults.

Tool returns { "error": true, "message": "Spending limit reached" } — the session's per-event charge limit was reached before the tool executed. Increase the spending limit in your Apify account settings or in your MCP client's session configuration. This is a safety feature, not a bug.

Response takes longer than 3 minutes — one or more of the 15 upstream actors hit the 180-second timeout. This typically happens with Censys or GitHub under high load. The result is still returned with data from the actors that completed successfully. Retry the call during off-peak hours or reduce maxResults to speed up actor calls.

converged: false in POSG output — HSVI2 did not converge within the iteration budget. This happens when the attack graph has many nodes and the belief simplex is large. The returned game value is a bound, not the exact Nash value. Use the result directionally rather than as a precise figure.

GPD shape parameter ξ is extreme (> 1 or < -0.5) — this indicates very few exceedances above the threshold, leading to unstable moment estimates. Increase maxResults to 50+ and ensure the query targets technologies with known high-severity CVEs. A shape parameter in [−0.5, 0.5] is statistically normal for CVSS data.

Responsible use

  • This server accesses only publicly available vulnerability data, DNS records, certificate transparency logs, and community-generated content.
  • Do not use this tool to plan, facilitate, or execute unauthorized access to computer systems. Passive analysis of publicly available data is lawful; unauthorized penetration testing is not.
  • Comply with your organization's security policy and applicable computer fraud laws (CFAA, Computer Misuse Act, EU NIS2) when conducting assessments.
  • Results should be reviewed by qualified security professionals before being used to make operational decisions.
  • For guidance on the legality of security research using public data, see Apify's guide on web scraping legality.

FAQ

How many tool calls can I make before the free tier runs out? The Apify free plan includes $5 of monthly credits. At $0.030–$0.045 per call, that covers 111–166 individual tool calls, or 16–17 full 8-tool assessments ($0.295 each). Credits reset monthly.

Does autonomous red team analysis work on internal network targets? The tools work on any query string, but results are limited to publicly available data. For an internal network (192.168.x.x addresses or internal hostnames), NVD and Censys will return nothing. The tools are most effective for internet-facing infrastructure, technology stacks, known threat actors, and CVE categories.

How is this different from running a manual CVSS lookup? A CVSS lookup gives you a single severity score for a single CVE. This MCP builds a multi-node attack graph from 15 sources and applies game theory, stochastic processes, and extreme value statistics to that graph. The output is an attacker-defender game value, optimal allocation recommendations, exploit chains with AND-OR prerequisites, and tail risk estimates — not a list of scores.

Can I use this to fulfill a compliance or audit requirement? The outputs can inform and support compliance work — particularly risk quantification sections of SOC 2, ISO 27001, and NIST CSF assessments — but they do not constitute a formal penetration test or audit. Compliance frameworks typically require testing conducted by certified professionals with explicit written authorization.

How accurate is the Hawkes vulnerability forecast? The Hawkes process captures temporal clustering in CVE disclosures for a given technology category. Predictions are probabilistic — predictedEvents30d: 8 means the model expects approximately 8 CVEs in 30 days, with uncertainty that grows with the forecast horizon. Accuracy improves with higher maxResults (more historical timestamps for calibration) and for technologies with many documented CVEs.

Is it legal to query CISA KEV, NVD, and Censys data? Yes. NVD is a US government public database. CISA KEV is a public advisory catalog. Censys publishes internet scan data as a public resource. All 15 data sources used by this MCP are publicly accessible. See Apify's legal guide for more context.

How is this different from commercial threat intelligence platforms like Recorded Future? Recorded Future and similar platforms provide analyst-curated, human-labeled threat intelligence with proprietary dark web and HUMINT sources. This MCP applies mathematical modeling — game theory, stochastic processes, extreme value theory — to open-source data. They complement each other: use commercial platforms for attribution and actor intelligence, this MCP for quantitative risk modeling and resource allocation.

What happens if one of the 15 upstream actors fails? The runActorsParallel() function catches errors per actor and returns an empty array for that source. The attack graph is built from the remaining sources. The graphSummary in the response reflects how many nodes and edges were actually constructed, giving you visibility into data completeness.

Can I run all 8 tools on the same target in parallel? Yes. Each tool is an independent MCP call. An AI agent can call all 8 in parallel and aggregate the results. A full parallel assessment of one target costs $0.295 and completes in 60–180 seconds (bounded by the slowest parallel data fetch, not the sum).

How do I interpret the gameValue from simulate_attack_defense_posg? gameValue is the Nash equilibrium value of the attack-defense game on a 0–1 scale, where 1 represents full attacker advantage and 0 represents full defender advantage. A value above 0.6 indicates the attacker has structural advantage on the current attack surface and defense allocation. Use optimalDefenseAllocation to identify where budget reallocation would most reduce the game value.

Can I use this MCP in an autonomous AI agent pipeline? Yes, this is the primary intended use. The server runs in Apify Standby mode, meaning it is always available for agent calls without cold start latency. It works with any MCP-compatible agent framework including LangChain agents, AutoGen, CrewAI, and custom agent loops.

Help us improve

If you encounter issues, enable run sharing in your Apify account to help us debug faster:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom security integrations or enterprise deployments, reach out through the Apify platform.

How it works

01

Configure

Set your parameters in the Apify Console or pass them via API.

02

Run

Click Start, trigger via API, webhook, or set up a schedule.

03

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Ready to try Autonomous Cyber Red Team MCP?

Start for free on Apify. No credit card required.

Open on Apify Store