Autonomous Cyber Red Team MCP
Autonomous cyber red team intelligence for AI agents — this MCP server gives Claude, GPT-4o, and any MCP-compatible agent the ability to run quantitative attack simulations, synthesize exploit chains, forecast vulnerability emergence, and model adversary behavior using eight rigorously implemented mathematical frameworks. It is built for security engineers, threat analysts, and AI agents that need structured, reproducible cyber risk intelligence from a single tool call.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| simulate-attack-defense-posg | POSG belief-space HSVI2 planning | $0.12 |
| synthesize-exploit-chains | AND-OR A* with CVSS heuristic | $0.10 |
| predict-vulnerability-emergence | Hawkes power-law vulnerability clustering | $0.08 |
| optimize-defender-allocation | Colonel Blotto Nash equilibrium | $0.08 |
| model-adaptive-adversary | Exp3 adversarial multi-armed bandit | $0.08 |
| compute-lateral-movement-risk | Absorbing Markov chain fundamental matrix | $0.08 |
| assess-zero-day-tail-risk | GPD extreme value tail modeling | $0.08 |
| forecast-threat-landscape-evolution | Replicator dynamics evolutionary game | $0.10 |
Example: 100 events = $12.00 · 1,000 events = $120.00
Connect to your AI agent
Add this MCP server to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.
https://ryanclinton--autonomous-cyber-red-team-mcp.apify.actor/mcp{
"mcpServers": {
"autonomous-cyber-red-team-mcp": {
"url": "https://ryanclinton--autonomous-cyber-red-team-mcp.apify.actor/mcp"
}
}
}Documentation
Autonomous cyber red team intelligence for AI agents — this MCP server gives Claude, GPT-4o, and any MCP-compatible agent the ability to run quantitative attack simulations, synthesize exploit chains, forecast vulnerability emergence, and model adversary behavior using eight rigorously implemented mathematical frameworks. It is built for security engineers, threat analysts, and AI agents that need structured, reproducible cyber risk intelligence from a single tool call.
The server aggregates live data from 15 cybersecurity sources in parallel — NVD, CISA KEV, Censys, DNS, SSL transparency logs, WHOIS, GitHub, StackExchange, Hacker News, OFAC, OpenSanctions, IP geolocation, tech stack detection, website monitoring, and FRED economic data — assembles a weighted attack graph, then applies algorithms from game theory, stochastic processes, and extreme value theory to produce quantitative outputs that go well beyond keyword searches or CVSS score lookups.
What data can you access?
| Data Point | Source | Example |
|---|---|---|
| 📋 CVE records with CVSS base scores and CWE classifications | NVD CVE Search | CVE-2021-44228, CVSS 10.0 (Log4Shell) |
| 🚨 Known exploited vulnerabilities with CISA remediation deadlines | CISA KEV Catalog | 1,000+ actively exploited CVEs |
| 🌐 Exposed hosts, open ports, and service banners | Censys Search | 443/tcp nginx 1.18.0, cert CN=*.target.com |
| 🔍 DNS record enumeration (A, AAAA, MX, TXT, CNAME, NS) | DNS Lookup | mail.acmecorp.com → 203.0.113.42 |
| 🔒 SSL/TLS certificates and transparency log entries | SSL Certificate Search | *.acmecorp.com, SAN=admin.acmecorp.com |
| 📝 Domain registration data, registrar, and expiry | WHOIS Lookup | Registered 2008, expires 2026, GoDaddy |
| 📍 IP-to-ASN mapping with country and ISP | IP Geolocation | AS14618 Amazon, us-east-1 |
| 🛠️ Technology stack identified from HTTP headers and HTML | Tech Stack Detector | Apache 2.4.51, PHP 7.4, WordPress 5.8 |
| 💻 Proof-of-concept exploit repositories and security tools | GitHub Repo Search | CVE-2021-44228 POC, 3.2K stars |
| 💬 Security community Q&A and technique discussions | StackExchange (Security.SE) | "How does Pass-the-Hash work on AD?" |
| 📰 Vulnerability disclosures and security news | Hacker News Search | "New 0-day in OpenSSL—patch now" |
| 🏛️ US Treasury OFAC Specially Designated Nationals | OFAC Sanctions Search | Threat actor entity screening |
| 🌍 Multi-jurisdiction sanctions and watchlists | OpenSanctions Search | 100+ programs, EU, UN, Interpol |
| 📡 Security advisory and policy page changes | Website Change Monitor | CISA advisory page deltas |
| 💰 Cybersecurity market spend and cyber insurance indices | FRED Economic Data | US cybersecurity GDP component |
MCP tools
| Tool | Price | Algorithm | Best for |
|---|---|---|---|
simulate_attack_defense_posg | $0.045 | POSG via HSVI2 point-based value iteration, alpha-vector pruning | Optimal attacker/defender strategy, game value computation |
synthesize_exploit_chains | $0.040 | AND-OR graph A* with CVSS admissible heuristic h(n) = max CVSS on path | Multi-step attack path discovery, remediation prioritization |
predict_vulnerability_emergence | $0.035 | Hawkes self-exciting process with power-law kernel, thinning simulation | 30/90-day CVE forecasting, patch cycle planning |
optimize_defender_allocation | $0.040 | Colonel Blotto game, Nash equilibrium via fictitious play | Security budget allocation, dominated-strategy elimination |
model_adaptive_adversary | $0.035 | Exp3 multi-armed bandit, importance-weighted reward updates | APT technique prediction, proactive defense planning |
compute_lateral_movement_risk | $0.035 | Absorbing Markov chain, fundamental matrix N=(I-Q)^-1, epidemic threshold | Network propagation risk, segmentation validation |
assess_zero_day_tail_risk | $0.035 | GPD extreme value theory, probability-weighted moments | VaR/CVaR quantification, cyber insurance pricing |
forecast_threat_landscape_evolution | $0.030 | Replicator dynamics dx_i/dt = x_i(f_i − φ), ESS identification | Long-term threat strategy, EMERGING/DECLINING classification |
Why use the Autonomous Cyber Red Team MCP?
Security teams running manual threat assessments spend hours correlating CVE feeds, mapping attack paths on whiteboards, and guessing at budget allocation. Consultancies charge $15,000–50,000 for a red team engagement that covers a fraction of the attack surface and produces results that are stale within weeks.
This MCP server automates the quantitative layer of that work. An AI agent calls a single tool, the server fans out across 15 data sources in parallel, builds a live attack graph, and returns mathematically grounded outputs — game values, probability distributions, expected steps to compromise, and tail risk estimates — within 60–180 seconds.
Benefits of running on the Apify platform:
- Scheduling — run weekly threat landscape forecasts on a cron schedule to track how the risk posture changes over time
- API access — integrate tool outputs directly into SIEM, SOAR, or custom dashboards via the Apify API
- Spending limits — set a maximum credit spend per session so AI agents cannot exceed a cost budget
- Webhooks — trigger alerts to Slack or PagerDuty when vulnerability burst probability exceeds a threshold
- No infrastructure — the MCP server runs in Apify standby mode with zero self-hosted infrastructure
Features
- 8 distinct mathematical frameworks — POSG, AND-OR A*, Hawkes process, Colonel Blotto, Exp3 bandit, absorbing Markov chain, GPD extreme value theory, and replicator dynamics — each implemented from first principles in TypeScript
- 15 live data sources queried in parallel — every tool call fans out to NVD, CISA KEV, Censys, DNS, SSL, WHOIS, IP geolocation, tech stack, GitHub, StackExchange, Hacker News, OFAC, OpenSanctions, website monitor, and FRED simultaneously
- Attack graph construction from heterogeneous inputs —
buildAttackGraph()normalizes all 15 source outputs into a unified weighted directed graph withVulnNodeandAttackEdgetypes, CVSS-weighted edges, AND/OR prerequisite flags, and technique labels - HSVI2 belief-space planning —
simulateAttackDefensePOSG()solves the NEXPTIME-complete POSG on the belief simplex using point-based value iteration with alpha-vector pruning, returning converged game values and belief-state strategies - CVSS-admissible A heuristic* —
synthesizeExploitChains()guarantees optimal-cost attack paths via h(n) = max CVSS score on remaining path; AND nodes require all prerequisites satisfied, OR nodes need any single one - Power-law Hawkes kernel —
predictVulnerabilityEmergence()uses λ(t) = μ + Σ_i α(1 + (t−t_i)/c)^(−(1+ω)) to capture long-memory CVE clustering; thinning simulation forecasts 30-day and 90-day counts - Fictitious play Nash equilibrium —
optimizeDefenderAllocation()runs the Colonel Blotto game on security domain battlefields; each iteration best-responds to the empirical opponent distribution until convergence - Exp3 importance-weighted updates —
modelAdaptiveAdversary()applies w_i(t+1) = w_i(t) × exp(η × r̂_i / K) with optimal η = √(2 ln K / T) to track which attack techniques an adversary will favor - Gauss-Jordan fundamental matrix —
computeLateralMovementRisk()inverts (I−Q) using full pivoting to compute N = (I−Q)^-1, giving expected steps from every transient state to absorption; epidemic threshold β_c = ⟨k⟩/⟨k²⟩ determines supercritical spread - Probability-weighted moments GPD fitting —
assessZeroDayTailRisk()fits a Generalized Pareto Distribution to CVSS exceedances above threshold, returning VaR(95%), VaR(99%), CVaR(95%), CVaR(99%), and return periods - Replicator dynamics with ESS detection —
forecastThreatLandscapeEvolution()integrates dx_i/dt = x_i(f_i − φ̄) and classifies each technique as EMERGING, GROWING, MATURE, or DECLINING based on growth rate and fitness; evolutionary stable strategies are identified by invasion resistance - Seeded PRNG for reproducibility — mulberry32 PRNG with query-derived seed ensures deterministic outputs for the same input across runs
- Spend limit enforcement — every tool calls
Actor.charge()and checkseventChargeLimitReachedbefore executing, respecting per-session budget caps set by the caller
Use cases for autonomous cyber red team intelligence
Penetration testing preparation
Security engineers preparing for a red team engagement use synthesize_exploit_chains to map the entire AND-OR attack graph for a target environment before the test begins. The A* search surfaces multi-step paths that human testers might miss — for example, a chain from an exposed legacy SSH service through a misconfigured jump host into a domain controller — with total CVSS cost and estimated execution time for each path.
Security budget allocation and CISO reporting
CISOs and security directors use optimize_defender_allocation to translate threat data into resource allocation recommendations. The Colonel Blotto output shows which security domains (perimeter, identity, endpoint, cloud, network) are under-defended relative to the Nash equilibrium, and identifies dominated strategies where current spending provides no marginal protection.
Vulnerability management and patch prioritization
Vulnerability management teams use predict_vulnerability_emergence to forecast CVE disclosure rates for specific technologies — Linux kernel, Apache, Windows Active Directory — over 30 and 90-day horizons. Hawkes process burst probability identifies when a clustering event is likely, enabling teams to pre-position remediation resources before a wave of disclosures.
Cyber insurance underwriting and actuarial pricing
Underwriters and actuaries use assess_zero_day_tail_risk to quantify the severity distribution of vulnerabilities in a policyholder's technology stack. GPD-fitted VaR and CVaR metrics provide a statistically grounded basis for maximum loss estimates and premium calculations, replacing rule-of-thumb cyber risk scoring.
APT tracking and threat intelligence
Threat intelligence analysts use model_adaptive_adversary to model how known APT groups — APT29, Lazarus, Sandworm — adapt their technique selection against specific defensive postures. The Exp3 bandit output predicts which MITRE ATT&CK techniques will be prioritized in future campaigns based on historical adaptation speed and estimated reward.
Strategic security planning
Security architects and long-range planners use forecast_threat_landscape_evolution to understand which attack techniques are gaining evolutionary fitness and which are declining. Replicator dynamics identifies evolutionary stable strategies — techniques that, once dominant, resist displacement by alternatives — enabling investment in defenses against tomorrow's threat landscape, not last year's.
How to connect this MCP server
Step 1 — Get your Apify API token
Sign up at apify.com and copy your API token from Account Settings. The free plan includes $5 of monthly credits.
Step 2 — Add to your MCP client
Claude Desktop — add to claude_desktop_config.json:
{
"mcpServers": {
"autonomous-cyber-red-team": {
"url": "https://autonomous-cyber-red-team-mcp.apify.actor/mcp",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}
Cursor / Windsurf / Cline — add the same URL and Authorization header in your MCP server settings.
Step 3 — Call a tool
Ask your agent: "Synthesize exploit chains for apache log4j corporate network" or "Model an APT29 adaptive adversary targeting financial services." The agent calls the tool, the server aggregates 15 data sources, and returns structured JSON with the quantitative analysis.
Step 4 — Interpret the results
Each tool returns a JSON object with algorithm-specific fields alongside a graphSummary showing how many nodes and edges were constructed from the live data. Higher node and edge counts indicate richer signal from the data sources.
Tool parameters
All 8 tools share the same two input parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | — | Target domain, technology, threat actor, or scenario (e.g. "apache log4j corporate network", "APT29 nation-state attack", "ransomware healthcare sector") |
maxResults | number | No | 30–50 (varies by tool) | Maximum results to fetch per actor. Range: 5–100. Higher values improve model quality but increase response time. Tools 3 and 7 default to 50 for better statistical estimation. |
Input examples
Exploit chain synthesis for a specific CVE:
{
"query": "apache log4j CVE-2021-44228 corporate network",
"maxResults": 30
}
APT adversary modeling with high data volume:
{
"query": "APT29 nation-state attack financial sector",
"maxResults": 50
}
Zero-day tail risk for a technology portfolio:
{
"query": "linux kernel remote code execution",
"maxResults": 50
}
Rapid threat landscape forecast:
{
"query": "ransomware healthcare sector",
"maxResults": 20
}
Input tips
- Be specific in your query — "windows active directory pass-the-hash" produces a more targeted attack graph than "windows security." The query is passed to all 15 data sources, so domain-specific terminology extracts more relevant signal.
- Increase maxResults for statistical tools —
predict_vulnerability_emergenceandassess_zero_day_tail_riskrely on sample size for distribution fitting. Use maxResults ≥ 50 for those two tools. - Use technology names with version context — "Apache HTTP Server 2.4" returns more precise CVE and Censys data than "web server."
- Combine tools for full assessments — running all 8 tools on the same query builds a complete picture: exploit paths, defender allocation, adversary adaptation, and tail risk for a total of $0.295 per target.
Output examples
simulate_attack_defense_posg output
{
"gameValue": 0.623,
"attackerValue": 0.741,
"defenderValue": 0.259,
"converged": true,
"iterations": 847,
"alphaVectorCount": 23,
"optimalAttackPath": [
"CVE-2021-44228-entry",
"corp-jumphost-pivot",
"ad-domain-controller-target"
],
"optimalDefenseAllocation": {
"perimeter": 0.38,
"identity": 0.31,
"endpoint": 0.18,
"cloud": 0.13
},
"beliefStates": [
{ "state": "undetected", "probability": 0.67 },
{ "state": "partial-detection", "probability": 0.24 },
{ "state": "full-detection", "probability": 0.09 }
],
"graphSummary": { "nodes": 47, "edges": 83 }
}
synthesize_exploit_chains output
{
"attackSurfaceScore": 8.14,
"chainCount": 12,
"criticalPath": {
"path": ["log4j-rce-entry", "lateral-smb-pivot", "lsass-dump-target", "dc-sync-exfil"],
"totalCvss": 34.7,
"probability": 0.43,
"andNodes": ["lsass-dump-target"],
"orNodes": ["log4j-rce-entry", "lateral-smb-pivot"],
"techniques": ["T1190", "T1021.002", "T1003.001", "T1003.006"],
"estimatedTime": 4.2
},
"averageChainLength": 3.8,
"maxCvssChain": 34.7,
"chains": [
{
"path": ["log4j-rce-entry", "lateral-smb-pivot", "lsass-dump-target", "dc-sync-exfil"],
"totalCvss": 34.7,
"probability": 0.43,
"techniques": ["T1190", "T1021.002", "T1003.001", "T1003.006"],
"estimatedTime": 4.2
}
],
"graphSummary": { "nodes": 47, "edges": 83 }
}
predict_vulnerability_emergence output
{
"baselineIntensity": 0.032,
"currentIntensity": 0.187,
"hawkesParams": {
"alpha": 0.71,
"omega": 1.23,
"c": 0.45
},
"predictedEvents30d": 8,
"predictedEvents90d": 21,
"burstProbability": 0.68,
"criticalVulnForecast": 3,
"clusterSizes": [2, 4, 3, 5, 2],
"intensityTimeline": [
{ "t": 0, "lambda": 0.187 },
{ "t": 7, "lambda": 0.143 },
{ "t": 14, "lambda": 0.112 },
{ "t": 30, "lambda": 0.089 }
],
"graphSummary": { "nodes": 61, "edges": 104 }
}
assess_zero_day_tail_risk output
{
"gpdParameters": {
"shape": 0.14,
"scale": 1.82,
"threshold": 7.0
},
"var95": 8.6,
"var99": 9.4,
"cvar95": 9.1,
"cvar99": 9.7,
"maxObserved": 10.0,
"tailIndex": 0.14,
"portfolioRisk": 0.83,
"exceedanceProbabilities": [
{ "severity": 8.0, "probability": 0.31 },
{ "severity": 9.0, "probability": 0.12 },
{ "severity": 9.5, "probability": 0.04 }
],
"returnPeriods": [
{ "severity": 9.0, "returnPeriodDays": 28 },
{ "severity": 9.5, "returnPeriodDays": 84 }
],
"graphSummary": { "nodes": 58, "edges": 96 }
}
Output fields by tool
simulate_attack_defense_posg
| Field | Type | Description |
|---|---|---|
gameValue | number | Nash equilibrium game value (0–1), higher = attacker advantage |
attackerValue | number | Attacker's expected utility under optimal strategy |
defenderValue | number | Defender's expected utility under optimal strategy |
converged | boolean | Whether HSVI2 iteration converged within tolerance |
iterations | number | Number of point-based value iteration cycles |
alphaVectorCount | number | Alpha vectors maintained after pruning |
optimalAttackPath | string[] | Node IDs on the optimal attack path |
optimalDefenseAllocation | object | Normalized budget allocation per security domain |
beliefStates | array | Detection state probabilities under optimal play |
graphSummary.nodes | number | Total nodes in the constructed attack graph |
graphSummary.edges | number | Total edges in the constructed attack graph |
synthesize_exploit_chains
| Field | Type | Description |
|---|---|---|
attackSurfaceScore | number | Aggregate attack surface severity score |
chainCount | number | Total exploit chains discovered |
criticalPath | object | Lowest-cost (highest-CVSS) exploit chain |
criticalPath.path | string[] | Node IDs from entry point to target |
criticalPath.totalCvss | number | Sum of CVSS scores along the chain |
criticalPath.probability | number | Estimated success probability |
criticalPath.andNodes | string[] | Nodes requiring all prerequisites (AND logic) |
criticalPath.orNodes | string[] | Nodes requiring any prerequisite (OR logic) |
criticalPath.techniques | string[] | MITRE ATT&CK technique IDs |
criticalPath.estimatedTime | number | Estimated attack execution time (hours) |
averageChainLength | number | Mean number of steps across all chains |
maxCvssChain | number | Maximum cumulative CVSS across all chains |
chains | array | Up to 10 exploit chains, ordered by CVSS cost |
predict_vulnerability_emergence
| Field | Type | Description |
|---|---|---|
baselineIntensity | number | Background CVE arrival rate μ |
currentIntensity | number | Current intensity λ(t) including excitation |
hawkesParams.alpha | number | Excitation magnitude parameter |
hawkesParams.omega | number | Power-law decay exponent |
hawkesParams.c | number | Power-law offset parameter |
predictedEvents30d | number | Expected CVE count in next 30 days |
predictedEvents90d | number | Expected CVE count in next 90 days |
burstProbability | number | Probability of a clustering burst event |
criticalVulnForecast | number | Forecast critical (CVSS ≥ 9.0) CVE count |
clusterSizes | number[] | Observed historical cluster sizes |
intensityTimeline | array | Intensity decay curve over time |
optimize_defender_allocation
| Field | Type | Description |
|---|---|---|
defenderBudget | number | Defender resource units |
attackerBudget | number | Attacker resource units |
battlefields | array | Per-domain allocation and win probabilities |
battlefields[].name | string | Security domain name |
battlefields[].defenderAlloc | number | Defender units allocated |
battlefields[].attackerAlloc | number | Attacker units allocated |
battlefields[].defenderWinProb | number | Defender win probability on this battlefield |
nashEquilibrium | object | Nash equilibrium allocation by domain |
defenderExpectedPayoff | number | Defender expected payoff at Nash |
attackerExpectedPayoff | number | Attacker expected payoff at Nash |
dominatedStrategies | string[] | Strategies eliminated by iterated dominance |
iterations | number | Fictitious play convergence iterations |
model_adaptive_adversary
| Field | Type | Description |
|---|---|---|
optimalArm | string | Attack technique the adversary has converged on |
predictedNextAction | string | Predicted next attack technique |
adaptationSpeed | number | Rate of weight concentration across rounds |
regret | number | Cumulative regret vs. best fixed strategy |
explorationRate | number | Current exploration probability |
rounds | number | Exp3 simulation rounds |
arms | array | Per-technique weight, probability, reward, pull count |
compute_lateral_movement_risk
| Field | Type | Description |
|---|---|---|
epidemicThreshold | number | β_c = ⟨k⟩/⟨k²⟩ critical transmission threshold |
currentBeta | number | Estimated current transmission rate |
supercritical | boolean | True if currentBeta > epidemicThreshold (compromise spreads) |
meanDegree | number | Mean network degree ⟨k⟩ |
meanSquareDegree | number | Second moment ⟨k²⟩ |
expectedSteps | object | Expected steps to compromise per starting node |
absorptionProbabilities | object | Probability of reaching each target from each source |
highRiskPaths | array | Top 10 paths by expected-steps-to-compromise |
fundamentalMatrixSize | number | Dimension of N = (I-Q)^-1 |
assess_zero_day_tail_risk
| Field | Type | Description |
|---|---|---|
gpdParameters.shape | number | GPD shape parameter ξ (tail heaviness) |
gpdParameters.scale | number | GPD scale parameter σ |
gpdParameters.threshold | number | Exceedance threshold u |
var95 | number | CVSS Value at Risk at 95th percentile |
var99 | number | CVSS Value at Risk at 99th percentile |
cvar95 | number | Conditional VaR (Expected Shortfall) at 95% |
cvar99 | number | Conditional VaR (Expected Shortfall) at 99% |
maxObserved | number | Maximum CVSS score observed in the dataset |
tailIndex | number | Tail heaviness index (positive = heavy tail) |
portfolioRisk | number | Aggregate portfolio cyber risk score 0–1 |
exceedanceProbabilities | array | P(X > x) at severity thresholds |
returnPeriods | array | Expected days between severity exceedances |
forecast_threat_landscape_evolution
| Field | Type | Description |
|---|---|---|
ess | string[] | Evolutionary stable strategies (invasion-resistant techniques) |
emergingThreats | string[] | Techniques classified as EMERGING |
decliningThreats | string[] | Techniques classified as DECLINING |
replicatorConvergence | number | Convergence measure of replicator dynamics (0–1) |
timeToEquilibrium | number | Estimated time steps to stable distribution |
landscapeEntropy | number | Shannon entropy of technique frequency distribution |
techniques | array | Per-technique name, frequency, fitness, phase, growthRate |
How much does autonomous red team intelligence cost?
Each tool uses pay-per-event pricing — you pay only when a tool call is successfully initiated. Prices vary by computational complexity.
| Tool | Price per call | 10 calls | 50 calls |
|---|---|---|---|
simulate_attack_defense_posg | $0.045 | $0.45 | $2.25 |
synthesize_exploit_chains | $0.040 | $0.40 | $2.00 |
predict_vulnerability_emergence | $0.035 | $0.35 | $1.75 |
optimize_defender_allocation | $0.040 | $0.40 | $2.00 |
model_adaptive_adversary | $0.035 | $0.35 | $1.75 |
compute_lateral_movement_risk | $0.035 | $0.35 | $1.75 |
assess_zero_day_tail_risk | $0.035 | $0.35 | $1.75 |
forecast_threat_landscape_evolution | $0.030 | $0.30 | $1.50 |
| Full 8-tool assessment (one target) | $0.295 | — | — |
You can set a maximum spending limit per session in your MCP client or via Apify's built-in spend controls. The server checks eventChargeLimitReached before executing each tool and returns a clean error if the budget is reached.
Apify's free plan includes $5 of monthly credits — enough for 16 full 8-tool assessments at no cost.
Compare this to commercial threat intelligence platforms like Recorded Future or Mandiant Advantage, which charge $30,000–150,000 per year for analyst-curated feeds without the quantitative modeling layer.
How the Autonomous Cyber Red Team MCP works
Phase 1 — Data aggregation (parallel, 60–150 seconds)
Every tool call dispatches up to 15 actor calls in parallel via runActorsParallel() in actor-client.ts. Each actor is called with the user's query and a per-actor maxResults cap. The 180-second actor timeout ensures the parallel fan-out completes before the attack graph is assembled. Failures in individual data sources are gracefully handled — that source returns an empty array and the graph is built from the remaining signals.
The 15 sources are selected to cover the full kill chain: NVD and CISA KEV provide vulnerability nodes, Censys and tech stack detection provide asset nodes, DNS and SSL expose the network perimeter, GitHub surfaces active exploit code, StackExchange and Hacker News provide context on exploitation technique feasibility, OFAC and OpenSanctions handle threat actor attribution, and FRED provides economic context for risk quantification.
Phase 2 — Attack graph construction
buildAttackGraph() in scoring.ts normalizes all 15 source outputs into a unified AttackGraph with typed VulnNode and AttackEdge structures. Nodes carry CVSS score, exploitability, impact, KEV flag, tech stack label, category (entry/pivot/target/exfil), and detection difficulty. Edges carry technique name, success probability, CVSS weight, and AND/OR prerequisite type. A seeded mulberry32 PRNG (seed=42) ensures deterministic graph construction for identical inputs. The resulting weighted adjacency matrix drives all downstream algorithms.
Phase 3 — Algorithm application
Each tool applies a distinct algorithm to the same attack graph:
- POSG (Tool 1) — HSVI2 iterates over belief-space points, computing upper and lower bounds on value until the gap is below tolerance. Alpha vectors are pruned at each iteration using domination checking. The converged values represent the Nash equilibrium of the infinite-horizon discounted game.
- AND-OR A (Tool 2)* — A priority queue explores the AND-OR graph with f(n) = g(n) + h(n), where g is cumulative CVSS cost and h = max remaining CVSS serves as an admissible heuristic. AND nodes are expanded only when all prerequisites are satisfied; OR nodes expand on any single satisfied prerequisite.
- Hawkes process (Tool 3) — Power-law kernel λ(t) = μ + Σ_i α(1 + (t−t_i)/c)^(−(1+ω)) is calibrated from CVE timestamp data. Thinning simulation generates sample paths for the 30-day and 90-day forecasts.
- Colonel Blotto (Tool 4) — Fictitious play best-responds to the current empirical opponent distribution across security battlefields each iteration until the mixed strategies converge to Nash equilibrium.
- Exp3 (Tool 5) — Exponential-weight update w_i(t+1) = w_i(t) × exp(η × r̂_i / K) with importance-weighted estimator r̂_i = r_i / p_i tracks adversary technique selection. Optimal learning rate η = √(2 ln K / T) is computed analytically.
- Absorbing Markov (Tool 6) — Gauss-Jordan inversion computes N = (I−Q)^-1 where Q is the transient-to-transient sub-matrix. Absorption matrix B = NR gives compromise probabilities. Epidemic threshold β_c = ⟨k⟩/⟨k²⟩ from mean-field theory determines supercritical spread.
- GPD (Tool 7) — Probability-weighted moments fit GPD parameters (shape ξ, scale σ) to CVSS exceedances above threshold u. P(X > x | X > u) = (1 + ξ(x−u)/σ)^(−1/ξ) gives exceedance probabilities for VaR and CVaR computation.
- Replicator dynamics (Tool 8) — Euler integration of dx_i/dt = x_i(f_i − φ̄) where φ̄ = Σ_j x_j f_j is mean fitness. Techniques are classified by growth rate sign and fitness percentile. ESS candidates are identified by perturbation stability.
Phase 4 — Response serialization
Results are serialized to JSON via the json() helper and returned as MCP CallToolResult with content[0].type = 'text'. All numeric fields are rounded for readability. The graphSummary field is appended to every response so callers can assess data richness.
Tips for best results
-
Query with technology plus context — "apache log4j 2.14 corporate internal network" builds a richer graph than "log4j." The query is passed verbatim to all 15 actors, so specificity multiplies across sources.
-
Use
predict_vulnerability_emergenceproactively — run it weekly on your highest-risk technology components. WhenburstProbabilityexceeds 0.6, pre-position your patch team. A weekly run costs $0.035 and can prevent an emergency response. -
Sequence POSG before Blotto for budget discussions —
simulate_attack_defense_posggives you the game value (attacker advantage), thenoptimize_defender_allocationtells you how to redistribute budget to counter it. Present both outputs together for CISO-level conversations. -
Set maxResults ≥ 50 for tail risk and Hawkes tools — GPD fitting and Hawkes calibration improve substantially with more samples. For
assess_zero_day_tail_riskandpredict_vulnerability_emergence, always use maxResults: 50 or higher. -
Run
forecast_threat_landscape_evolutionquarterly — replicator dynamics show technique lifecycle stages. Techniques classified as EMERGING now will be GROWING in 6–12 months. Quarterly forecasts give you 2–3 quarters of lead time to build defenses against the next dominant technique. -
Combine with WHOIS Domain Lookup and Website Tech Stack Detector — these actors are embedded in the MCP's data aggregation, but running them standalone first lets you validate that the target is correctly identified before a full 8-tool assessment.
-
Export results to a dataset for trending — use Apify's dataset storage to persist tool outputs over time. Comparing
gameValuefromsimulate_attack_defense_posgmonth-over-month gives an objective measure of how your attack surface is changing.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Tech Stack Detector | Detect technology stack first, then pass identified technologies as queries to synthesize_exploit_chains for targeted exploit chain analysis |
| WHOIS Domain Lookup | Enumerate subdomains and registration data, then feed domain list into simulate_attack_defense_posg for perimeter attack surface modeling |
| Company Deep Research | Generate a company intelligence report, extract technology mentions, then run predict_vulnerability_emergence on each identified technology |
| Website Change Monitor | Monitor CISA advisory and vendor security pages; trigger synthesize_exploit_chains via webhook when a new advisory is detected |
| Cyber Attack Surface Report | Run a full passive attack surface report, then feed the output into compute_lateral_movement_risk for network propagation analysis |
| B2B Lead Qualifier | Screen prospects by technology stack, then use assess_zero_day_tail_risk on their stack to identify high-risk leads for cybersecurity products |
| Company Due Diligence Report | Enrich a due diligence report with assess_zero_day_tail_risk and forecast_threat_landscape_evolution for acquisition cyber risk sections |
Limitations
- Passive analysis only — this server never sends packets to target infrastructure. It accesses only publicly available data from government registries, certificate transparency logs, DNS, and community databases. It cannot discover vulnerabilities that are not publicly documented.
- Attack graph quality depends on data source coverage — queries for niche or proprietary technologies may return sparse NVD and GitHub data, resulting in smaller attack graphs and less statistically reliable model outputs. Check
graphSummary.nodes— graphs with fewer than 10 nodes should be interpreted cautiously. - Mathematical models are probabilistic, not deterministic — POSG game values, Hawkes forecasts, and GPD tail estimates are statistical outputs grounded in the available data, not ground-truth assessments. They quantify risk, they do not certify it.
- No runtime exploit execution — this is not an automated exploitation framework. It identifies and prioritizes attack paths; executing those paths requires a human red team.
- 15 data sources have individual rate limits — during high-traffic periods, individual actors (especially Censys and GitHub) may return fewer results than
maxResultsdue to upstream rate limiting. This is handled gracefully but may reduce graph density. - Hawkes calibration requires historical CVE timestamps — if NVD returns few results for a niche query, the Hawkes model will have limited historical data to calibrate against and will fall back to baseline intensity estimates.
- GPD fitting requires exceedances above threshold — if fewer than 5 CVSS scores exceed the threshold (default u = 7.0), the GPD shape and scale estimates will be unreliable.
maxResults: 50mitigates this. - No authentication bypass or credential testing — the server does not attempt to log in to systems, test default credentials, or execute authenticated scans.
Integrations
- Apify API — call individual tools directly from Python or Node.js applications, or integrate MCP responses into SIEM/SOAR pipelines via REST
- Webhooks — trigger downstream actions (PagerDuty alert, Jira ticket, Slack notification) when
burstProbabilityorgameValueexceeds a threshold - Zapier — schedule weekly threat landscape forecasts and push results to Google Sheets or a Notion database for tracking over time
- Make — build multi-step automation: website change detected → run exploit chain synthesis → push findings to security ticketing system
- Google Sheets — export weekly VaR, CVaR, and game value metrics to a spreadsheet for board-level cyber risk reporting
- LangChain / LlamaIndex — use MCP tool outputs as structured context in RAG pipelines for AI-assisted threat report generation
Troubleshooting
Empty or very small attack graph (graphSummary.nodes < 5) — the query returned few results from NVD and related sources. Try a broader query: use the technology name without version number, or add the word "vulnerability." Also check that maxResults is at least 20. For very niche technologies, the attack graph will be sparse regardless of maxResults.
Tool returns { "error": true, "message": "Spending limit reached" } — the session's per-event charge limit was reached before the tool executed. Increase the spending limit in your Apify account settings or in your MCP client's session configuration. This is a safety feature, not a bug.
Response takes longer than 3 minutes — one or more of the 15 upstream actors hit the 180-second timeout. This typically happens with Censys or GitHub under high load. The result is still returned with data from the actors that completed successfully. Retry the call during off-peak hours or reduce maxResults to speed up actor calls.
converged: false in POSG output — HSVI2 did not converge within the iteration budget. This happens when the attack graph has many nodes and the belief simplex is large. The returned game value is a bound, not the exact Nash value. Use the result directionally rather than as a precise figure.
GPD shape parameter ξ is extreme (> 1 or < -0.5) — this indicates very few exceedances above the threshold, leading to unstable moment estimates. Increase maxResults to 50+ and ensure the query targets technologies with known high-severity CVEs. A shape parameter in [−0.5, 0.5] is statistically normal for CVSS data.
Responsible use
- This server accesses only publicly available vulnerability data, DNS records, certificate transparency logs, and community-generated content.
- Do not use this tool to plan, facilitate, or execute unauthorized access to computer systems. Passive analysis of publicly available data is lawful; unauthorized penetration testing is not.
- Comply with your organization's security policy and applicable computer fraud laws (CFAA, Computer Misuse Act, EU NIS2) when conducting assessments.
- Results should be reviewed by qualified security professionals before being used to make operational decisions.
- For guidance on the legality of security research using public data, see Apify's guide on web scraping legality.
FAQ
How many tool calls can I make before the free tier runs out? The Apify free plan includes $5 of monthly credits. At $0.030–$0.045 per call, that covers 111–166 individual tool calls, or 16–17 full 8-tool assessments ($0.295 each). Credits reset monthly.
Does autonomous red team analysis work on internal network targets?
The tools work on any query string, but results are limited to publicly available data. For an internal network (192.168.x.x addresses or internal hostnames), NVD and Censys will return nothing. The tools are most effective for internet-facing infrastructure, technology stacks, known threat actors, and CVE categories.
How is this different from running a manual CVSS lookup? A CVSS lookup gives you a single severity score for a single CVE. This MCP builds a multi-node attack graph from 15 sources and applies game theory, stochastic processes, and extreme value statistics to that graph. The output is an attacker-defender game value, optimal allocation recommendations, exploit chains with AND-OR prerequisites, and tail risk estimates — not a list of scores.
Can I use this to fulfill a compliance or audit requirement? The outputs can inform and support compliance work — particularly risk quantification sections of SOC 2, ISO 27001, and NIST CSF assessments — but they do not constitute a formal penetration test or audit. Compliance frameworks typically require testing conducted by certified professionals with explicit written authorization.
How accurate is the Hawkes vulnerability forecast?
The Hawkes process captures temporal clustering in CVE disclosures for a given technology category. Predictions are probabilistic — predictedEvents30d: 8 means the model expects approximately 8 CVEs in 30 days, with uncertainty that grows with the forecast horizon. Accuracy improves with higher maxResults (more historical timestamps for calibration) and for technologies with many documented CVEs.
Is it legal to query CISA KEV, NVD, and Censys data? Yes. NVD is a US government public database. CISA KEV is a public advisory catalog. Censys publishes internet scan data as a public resource. All 15 data sources used by this MCP are publicly accessible. See Apify's legal guide for more context.
How is this different from commercial threat intelligence platforms like Recorded Future? Recorded Future and similar platforms provide analyst-curated, human-labeled threat intelligence with proprietary dark web and HUMINT sources. This MCP applies mathematical modeling — game theory, stochastic processes, extreme value theory — to open-source data. They complement each other: use commercial platforms for attribution and actor intelligence, this MCP for quantitative risk modeling and resource allocation.
What happens if one of the 15 upstream actors fails?
The runActorsParallel() function catches errors per actor and returns an empty array for that source. The attack graph is built from the remaining sources. The graphSummary in the response reflects how many nodes and edges were actually constructed, giving you visibility into data completeness.
Can I run all 8 tools on the same target in parallel? Yes. Each tool is an independent MCP call. An AI agent can call all 8 in parallel and aggregate the results. A full parallel assessment of one target costs $0.295 and completes in 60–180 seconds (bounded by the slowest parallel data fetch, not the sum).
How do I interpret the gameValue from simulate_attack_defense_posg?
gameValue is the Nash equilibrium value of the attack-defense game on a 0–1 scale, where 1 represents full attacker advantage and 0 represents full defender advantage. A value above 0.6 indicates the attacker has structural advantage on the current attack surface and defense allocation. Use optimalDefenseAllocation to identify where budget reallocation would most reduce the game value.
Can I use this MCP in an autonomous AI agent pipeline? Yes, this is the primary intended use. The server runs in Apify Standby mode, meaning it is always available for agent calls without cold start latency. It works with any MCP-compatible agent framework including LangChain agents, AutoGen, CrewAI, and custom agent loops.
Help us improve
If you encounter issues, enable run sharing in your Apify account to help us debug faster:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom security integrations or enterprise deployments, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.
Ready to try Autonomous Cyber Red Team MCP?
Start for free on Apify. No credit card required.
Open on Apify Store