Academic Research Intelligence MCP Server is an MCP (Model Context Protocol) server on ApifyForge. MCP server for multi-database academic literature search. Wraps 6 specialized actors: PubMed (biomedical), Semantic Scholar (all disciplines with AI summaries), ArXiv (preprints), Crossref (DOI metadata with... It costs $0.05 per search-pubmed. It exposes 7 tools: search-pubmed, search-semantic-scholar, search-arxiv, search-crossref, search-openalex, find-researcher, literature-review. Best for AI developers and agent builders who need structured real-world data inside Claude, Cursor, or other MCP-compatible clients. Not ideal for non-AI workflows or use cases that don't involve an MCP-compatible client. Maintenance pulse: 90/100. Last verified March 27, 2026. Built by Ryan Clinton (ryanclinton on Apify).
Academic Research Intelligence MCP Server
Academic Research Intelligence MCP Server is an MCP (Model Context Protocol) server available on ApifyForge at $0.05 per search-pubmed. MCP server for multi-database academic literature search. Wraps 6 specialized actors: PubMed (biomedical), Semantic Scholar (all disciplines with AI summaries), ArXiv (preprints), Crossref (DOI metadata with citations/funders), OpenAlex (250M+ works), and ORCID (researcher profiles). Includes a unified literature review tool with cross-database deduplication.
Best for AI developers and agent builders who need structured real-world data inside Claude, Cursor, or other MCP-compatible clients.
Not ideal for non-AI workflows or use cases that don't involve an MCP-compatible client.
Tools exposed
Each pricing event corresponds to a tool your AI agent can call through MCP.
search-pubmedSearch biomedical literature on PubMed. · $0.05/callsearch-semantic-scholarSearch academic papers on Semantic Scholar. · $0.05/callsearch-arxivSearch preprint papers on ArXiv. · $0.05/callsearch-crossrefSearch academic papers via Crossref DOI registry. · $0.05/callsearch-openalexSearch research works on OpenAlex. · $0.05/callfind-researcherLook up researcher profiles via ORCID. · $0.05/callliterature-reviewComposite literature review across multiple academic databases. · $0.15/callExample prompts
Natural language queries you can ask your AI assistant that would trigger this MCP server.
What to know
- Requires an MCP-compatible client (Claude Desktop, Cursor, Windsurf, or similar).
- Tool call results depend on the availability of upstream public APIs.
- Requires an Apify account and API token for authentication.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| search-pubmed | Search biomedical literature on PubMed. | $0.05 |
| search-semantic-scholar | Search academic papers on Semantic Scholar. | $0.05 |
| search-arxiv | Search preprint papers on ArXiv. | $0.05 |
| search-crossref | Search academic papers via Crossref DOI registry. | $0.05 |
| search-openalex | Search research works on OpenAlex. | $0.05 |
| find-researcher | Look up researcher profiles via ORCID. | $0.05 |
| literature-review | Composite literature review across multiple academic databases. | $0.15 |
Example: 100 events = $5.00 · 1,000 events = $50.00
Documentation

Academic Research Intelligence MCP is multi-database scholarly search infrastructure for AI agents and research workflows.
It wraps six free academic data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) behind one MCP endpoint and adds a composite literature review tool that queries multiple databases in parallel and deduplicates results by DOI. No API keys required. Built for AI research assistants, systematic-review teams, biomedical analysts, science writers, and any agent that needs a single tool for "find me the papers."
The category
Academic Research Intelligence MCP is multi-database scholarly search infrastructure. Unlike single-database wrappers (which force the agent to pick the right source before searching) or scraping tools (which break when site HTML changes), it exposes six official academic APIs as one MCP toolset and adds DOI-based cross-database deduplication. Agents see one literature surface instead of six, and the composite literature review returns coverage statistics so a thin search is never mistaken for a complete one.
In one sentence
Search PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID through a single MCP server, or run one composite literature review that queries three databases in parallel and deduplicates by DOI.
Category: Academic research MCP. Multi-database literature search. AI agent tooling. Primary use case: Give an AI agent one tool that covers all the major free academic databases. Can also be used for systematic-review compilation, researcher discovery, and DOI cross-referencing.
Also known as: academic research MCP, scholarly search MCP server, multi-database literature search, PubMed MCP, ArXiv MCP, Semantic Scholar MCP, OpenAlex MCP, Crossref MCP, ORCID MCP, literature review agent tool.
What this actor does
- What it is: A standby-mode MCP server exposing 8 tools that wrap 6 free academic data sources.
- What it checks: Biomedical literature, all-discipline papers with AI summaries, preprints, DOI metadata, broad academic works, and researcher profiles.
- What it returns: Structured JSON paper lists with titles, authors, year, journal, DOI, citations, abstracts, plus cross-database coverage stats for the composite literature review tool.
- What it does NOT do: No full-text PDF download, no peer-review verdict, no plagiarism checking, no integrity scoring (see Research Integrity Screening MCP for that).
- Who it's for: AI research assistants, systematic-review teams, biomedical analysts, science journalists, R&D analysts, academic recruiters.
What you get from one call
research_literature_review fans out to PubMed, Semantic Scholar, and Crossref (and optionally ArXiv) in parallel and returns:
papers[]ranked by source count then citation count (papers found in multiple databases sort first)coverage.sourcesSearchedthe exact list of databases that rancoverage.resultsPerSourceper-database hit countscoverage.totalBeforeDedupraw result count across all sourcescoverage.uniquePapersWithDoideduplicated paper countcoverage.papersFoundInMultipleSourcescorroboration count (the strongest "this paper is real and relevant" signal)papersWithoutDoi[]results that lacked a DOI, kept separate so dedup confidence stays clean
Each paper carries doi, title, authors, year, journal, citationCount, abstract (or AI TLDR from Semantic Scholar), isOpenAccess, url, plus foundIn[] listing every database that returned it.

What makes this different
- Six databases, one MCP toolset, zero credentials. PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID all run through one endpoint with no API keys to provision.
- DOI-based cross-database deduplication.
research_literature_reviewcollapses the same paper appearing in three databases into one ranked record withsourceCount, so coverage and corroboration are visible. - Coverage-honest. Every literature review returns
sourcesSearchedandresultsPerSource, so a thin search (one database delivered, two empty) is never mistaken for a complete one.
Before vs after
| Without this MCP | With this MCP |
|---|---|
| Open PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID in six tabs | One MCP endpoint, six databases reachable as tools |
| Provision API keys per service (Semantic Scholar, Crossref polite pool, etc.) | No keys required, all sources are free public APIs |
| Manually deduplicate paper lists by title across databases | DOI-based dedup with multi-source confidence ranking |
| Write custom retry and rate-limit code per database | Parallel Promise.all fan-out with per-source timeouts |
| Lose track of which databases were actually checked | coverage.sourcesSearched and resultsPerSource returned every time |
Architecture
agent prompt
↓
MCP /mcp endpoint (StreamableHTTP)
↓
8 registered tools (6 single-source + literature review + list sources)
↓
6 sub-actors called in parallel via apify-client
↓
PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID
↓
literature review: DOI normalization → dedup map → multi-source ranking
↓
structured JSON response with coverage stats
The MCP runs in Apify Standby mode with a configurable idle-shutdown window (default 300s) so platform compute stops billing when no tools are firing.
Built for
AI research assistants embedded in Claude Desktop, Cursor, Windsurf, or LangChain pipelines; systematic-review teams running PubMed-plus-others searches under Cochrane guidelines; biomedical analysts and science journalists who need a single "find me the papers" tool; academic recruiters and tenure committees verifying publication records through ORCID; R&D and competitive-intelligence teams tracking ArXiv preprints and conference proceedings; agent builders who want one academic-search interface instead of six.
This server runs in Standby mode on the Apify platform. It orchestrates six official academic APIs through six sibling Apify actors, normalizes the response shapes, and returns structured JSON your AI agent reasons over directly.
Questions this MCP answers
- "Find me all papers on CAR-T cell therapy efficacy across PubMed, Semantic Scholar, and Crossref."
- "What are the latest ArXiv preprints on transformer attention mechanisms?"
- "Which biomedical papers cite this NIH-funded study?"
- "Has this researcher published in predatory venues? Show their full publication record from ORCID."
- "Give me a literature review of CRISPR off-target effects with cross-database coverage stats."
- "What is the AI TLDR for this Semantic Scholar paper, and how many influential citations does it have?"
- "Find me Crossref-registered papers from this journal between 2020 and 2024 with funder information."
- "Which authors at MIT have published on protein folding since 2022?"
For AI agents
- First tool to reach for when a prompt mentions academic papers, scholarly literature, systematic reviews, citations, DOIs, preprints, or researcher publication records.
- Call
research_list_sourcesfirst (it is free). It enumerates the 8 tools, the 6 sources, and the record counts behind each, so the agent can decide which single-database tool to call before paying for a multi-database review. - Use
research_literature_reviewwhen the user wants coverage, not just speed. It runs PubMed + Semantic Scholar + Crossref in parallel and deduplicates by DOI, one call replaces three. Addinclude_arxiv: trueif the topic is physics, math, CS, or stats. - Use a single-database tool when the database is named. "Find PubMed papers on..." goes to
research_search_pubmed, not the composite review. Single-database calls are $0.05 vs $0.15 for the review. - Read
coverage.sourcesSearchedandpapersFoundInMultipleSourcesbefore summarizing. A literature review where only one of three databases returned papers is low-corroboration, treat it as preliminary. - For researcher lookups, use
research_find_researcher. Setfetch_works: trueonly when the user explicitly asks for the full publication list (it is slower).
Use this MCP when an AI agent needs to:
- run a literature review across multiple academic databases
- look up biomedical, AI/ML, physics, or cross-discipline papers
- find researcher profiles, affiliations, and external IDs
- pull DOI metadata with funder, ORCID, and licensing information
- discover ArXiv preprints before journal publication
- get AI-generated paper summaries and influential citation counts
- build a corroborated paper list with cross-database confidence scores
What data can you access?
| Data Point | Source | Example |
|---|---|---|
| 🧬 Biomedical citations with MeSH terms and abstracts | PubMed / MEDLINE | 36M+ citations, "CRISPR gene editing" returns ~12k papers |
| 📄 All-discipline papers with AI TLDR and influential citations | Semantic Scholar | 200M+ papers, TLDR field is one-sentence AI summary |
| 📑 Open-access preprints in physics, math, CS, biology, stats | ArXiv | 2.4M+ preprints, prefix syntax (ti:, au:, abs:, cat:) |
| 🔗 DOI-registered works with funders and ORCID-linked authors | Crossref | 150M+ works, funder names + grant numbers + license URLs |
| 📚 Broad academic index with concept tagging and institution data | OpenAlex | 250M+ works, concept hierarchy + institution affiliations |
| 👤 Researcher profiles with career history and external IDs | ORCID | 18M+ profiles, Scopus / ResearcherID linkage, employment history |
| 🧠 AI-generated TLDR (one-sentence paper summary) | Semantic Scholar | "Transformers replace recurrence with self-attention for sequence modeling" |
| 📈 Influential citation count (citations central to the citing paper) | Semantic Scholar | influentialCitationCount: 1,247 (of 18,392 total citations) |
| 🏛️ Author affiliations and institution employment history | ORCID + OpenAlex | "Dr. Y. Bengio, Mila / Université de Montréal, 2016-present" |
| 💰 Funder names with grant numbers and licensing URLs | Crossref | "NIH R01CA123456, CC-BY 4.0" |
Why use Academic Research Intelligence MCP?
Most agent-driven academic search is:
- single-database (the agent calls only the source it knows about, missing cross-database corroboration)
- credential-heavy (Semantic Scholar polite pool, Crossref polite pool, ORCID token rotation, all separately provisioned)
- inconsistent in response shape (every database returns a different paper schema)
- silent on coverage (the agent does not know which databases actually returned data)
This MCP turns that into one tool surface. A single literature-review call queries three databases in parallel, normalizes the paper shapes, deduplicates by DOI, and returns explicit coverage stats your agent acts on directly. Single-database tools stay available when the user names the database.
- Scheduling: run periodic literature sweeps on Apify Scheduler; pipe new papers to Slack or email via webhooks
- API access: trigger searches from Python, JavaScript, or any HTTP client using standard MCP protocol
- Parallel fan-out: literature review queries three databases simultaneously, not sequentially
- No API keys: all six data sources are free public academic APIs, no credentials to provision
- Integrations: pipe results into Notion, Airtable, Google Sheets, or any webhook-compatible knowledge base
Features
Multi-database search (8 MCP tools, 6 academic sources)
- PubMed (36M+ biomedical citations) with field-tag syntax ([Title], [MeSH Terms], [Author]), boolean AND/OR/NOT, article-type filter, date range.
- Semantic Scholar (200M+ papers) with AI-generated TLDR summaries, influential citation counts, venue and field-of-study filters, citation-sorted results.
- ArXiv (2.4M+ preprints) with prefix syntax (ti:, au:, abs:, cat:), category filter, submission-date sort. Rate-limited at the source (1 request per 3s).
- Crossref (150M+ DOI-registered works) with funder names, grant numbers, ORCID author IDs, publication-type filter (journal-article, book-chapter, dataset, etc.).
- OpenAlex (250M+ works) with concept tagging, institution affiliations, citation-count sort, open-access filter.
- ORCID (18M+ researchers) with name, affiliation, keyword, or raw Lucene query. Optional
fetch_worksflag pulls full publication lists.
Composite literature review
research_literature_reviewqueries PubMed + Semantic Scholar + Crossref in parallel (and optionally ArXiv).- DOI normalization (strips
https://doi.org/prefix, lowercases) before dedup. - Multi-source ranking: papers found in more databases sort higher, ties broken by citation count.
- Coverage stats (
sourcesSearched,resultsPerSource,totalBeforeDedup,uniquePapersWithDoi,papersFoundInMultipleSources) returned on every call. - Papers without a DOI kept in a separate
papersWithoutDoi[]array so the dedup count stays honest.
Operational layer
- Apify Standby mode with configurable idle-shutdown (default 300s, env var
STANDBY_IDLE_TIMEOUT_SECS). - Failure-webhook registration on every container start, customer-side failures push to the operator's webhook handler automatically.
- Per-tool timeouts (120s default, 180s for Semantic Scholar and ORCID, 300s for ArXiv) so a slow source degrades the result instead of blocking it.
Quickstart workflows
Systematic review (Cochrane-style)
topic from user
→ research_literature_review (PubMed + S2 + Crossref + ArXiv)
→ sort papers by sourceCount desc, then citationCount desc
→ for each paper in top 50: include if foundIn.length >= 2
→ export DOIs + titles + abstracts for full-text retrieval
Single-database lookup (named source)
user names a database, e.g. "find me PubMed papers on..."
→ research_search_pubmed with field tags
→ return papers with PubMed-specific fields (MeSH terms, article type, pubmed URL)
Researcher discovery
researcher name from user
→ research_find_researcher (family_name + affiliation)
→ if disambiguation needed: fetch_works=true on the top match
→ return ORCID profile + career history + publication list
Use cases for multi-database academic search
AI research assistants embedded in chat interfaces
Conversational AI assistants embedded in Claude Desktop, Cursor, ChatGPT, or custom agents need a single tool for "find me the papers." Without this MCP, an agent has to pick one database before searching and may miss corroborating results from other sources. The research_literature_review tool queries three databases in parallel, returns a unified DOI-deduplicated paper list with source coverage stats, and lets the agent answer "what does the literature say about X" with cross-database confidence. One MCP call replaces three database lookups plus a manual dedup step.
Systematic-review and meta-analysis teams
Systematic reviewers working under Cochrane or PRISMA guidelines must search at least three databases (typically PubMed plus two others) and document the search strategy per database. research_literature_review returns coverage.sourcesSearched and coverage.resultsPerSource on every call, so the audit trail is produced automatically. DOI-based dedup means the team starts the screening phase with a unique-record list, not a raw union with duplicates. Adding include_arxiv: true extends coverage into preprints, useful for fast-moving fields like AI/ML and bioRxiv-style biology research.
Biomedical analysts and science journalists
Journalists tracking a breaking science story or analysts compiling a clinical-evidence brief need PubMed (peer-reviewed biomedical), Crossref (DOI metadata with funders), and Semantic Scholar (AI TLDRs for fast skimming) in one place. The composite review tool runs all three in parallel, ranks papers by cross-database corroboration, and surfaces highly-cited or open-access work first. The Semantic Scholar TLDR field is particularly useful when triaging dozens of paper titles into a shortlist.
R&D and competitive-intelligence teams tracking preprints
Industry R&D teams in biotech, pharma, AI, and quantum need early signals from preprints before journal publication. research_search_arxiv with sort_by: submittedDate returns the latest preprints in a category (cs.AI, cs.CL, math.OC, q-bio.GN). Pair with research_search_semantic_scholar filtered to venue: ArXiv and sort_by: citationCount to find which preprints are already accumulating citations, the strongest signal that a result will land. Schedule weekly via Apify Scheduler and pipe new high-citation preprints to Slack.
Academic recruiters and tenure committees
Recruiters, provosts, and tenure committees screening candidates need verified publication records, not LinkedIn summaries. research_find_researcher queries ORCID by name and affiliation, and with fetch_works: true returns the full publication list with external IDs (Scopus, ResearcherID) for cross-referencing. Combine with research_search_crossref filtered to the candidate's author name for funder and grant data. The ORCID profile carries employment history, useful for verifying claimed positions and start dates.
Knowledge-base builders for RAG pipelines
Teams building retrieval-augmented generation pipelines for a research domain need a clean, structured paper corpus, not scraped HTML. research_literature_review returns title, authors, year, journal, abstract (or Semantic Scholar TLDR), DOI, citation count, and open-access status as structured JSON ready to chunk and embed. Schedule monthly to keep the corpus fresh. The papersFoundInMultipleSources count is a useful pre-filter for trust-weighted retrieval.
How to connect this academic research MCP
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"academic-research": {
"url": "https://academic-research-mcp.apify.actor/mcp",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}
Cursor, Windsurf, or Cline
Use the same URL and token in your MCP server settings panel. The server communicates via standard MCP protocol over HTTP POST to /mcp.
Python (via requests)
import requests
response = requests.post(
"https://academic-research-mcp.apify.actor/mcp",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_APIFY_TOKEN"
},
json={
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "research_literature_review",
"arguments": {
"query": "CRISPR off-target effects",
"year_from": 2022,
"max_per_source": 50,
"include_arxiv": False
}
},
"id": 1
}
)
result = response.json()
review = result["result"]["content"][0]["text"]
print(review)
JavaScript
const response = await fetch(
"https://academic-research-mcp.apify.actor/mcp",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_APIFY_TOKEN"
},
body: JSON.stringify({
jsonrpc: "2.0",
method: "tools/call",
params: {
name: "research_search_semantic_scholar",
arguments: {
query: "transformer attention mechanism",
year_from: 2020,
min_citations: 50,
sort_by: "citationCount",
max_results: 25
}
},
id: 1
})
}
);
const data = await response.json();
const review = JSON.parse(data.result.content[0].text);
console.log(`Found ${review.total} papers from ${review.source}`);
cURL
# Run a multi-database literature review
curl -X POST "https://academic-research-mcp.apify.actor/mcp" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "research_literature_review",
"arguments": {
"query": "CAR-T cell therapy efficacy",
"year_from": 2021,
"year_to": 2025,
"max_per_source": 50,
"include_arxiv": false
}
},
"id": 1
}'
Environment variables
All six data sources are free public academic APIs and need no key.
| Variable | Required | Purpose |
|---|---|---|
STANDBY_IDLE_TIMEOUT_SECS | Optional | Standby idle-shutdown window in seconds (default 300, minimum 60). The instance exits after this idle period to release platform compute; the next request cold-starts a fresh one. |
MCP tools

| Tool | PPE event | Price | What it returns |
|---|---|---|---|
research_search_pubmed | search-pubmed | $0.05 | PubMed biomedical citations with title, authors, journal, MeSH terms, abstract, pubmed URL. Field tags ([Title], [MeSH Terms], [Author]), boolean AND/OR/NOT, article-type filter, date range, max 500. |
research_search_semantic_scholar | search-semantic-scholar | $0.05 | Semantic Scholar papers with AI-generated TLDR, influentialCitationCount, venue, field-of-study, open-access PDF link. Year range, venue, field, min_citations, sort by citationCount or publicationDate, max 500. |
research_search_arxiv | search-arxiv | $0.05 | ArXiv preprints with title, authors, abstract, category, PDF URL. Prefix syntax (ti:, au:, abs:, cat:), category filter (cs.AI, math.CO, stat.ML, physics.hep-th, etc.), sort by relevance or date, max 500. Rate-limited at source (1 req per 3s). |
research_search_crossref | search-crossref | $0.05 | Crossref DOI-registered works with funders, grant numbers, ORCID author IDs, licensing, publication type. Filter by query / author / journal / DOI prefix / type, year range, sort by relevance / citation count / publication date, max 500. |
research_search_openalex | search-openalex | $0.05 | OpenAlex works with concept tagging, institution affiliations, citation counts, open-access status. Year filter, min_citations, sort by relevance / cited_by_count / publication_date, max 500. |
research_find_researcher | find-researcher | $0.05 | ORCID researcher profiles with career history, employment, external IDs (Scopus, ResearcherID). Family name, given names, affiliation, keyword, or raw Lucene query. Optional fetch_works: true pulls full publication list. Max 100. |
research_literature_review | literature-review | $0.15 | Composite review: queries PubMed + Semantic Scholar + Crossref in parallel (+ optional ArXiv), deduplicates by DOI, ranks by source count then citation count. Returns papers[] with foundIn[] and sourceCount, plus coverage block (sourcesSearched, resultsPerSource, totalBeforeDedup, uniquePapersWithDoi, papersFoundInMultipleSources). |
research_list_sources | (none, free) | Free | Enumerates the 8 tools, 6 sources, and record counts. No upstream fetch, no charge. Useful for agent planning before paying for a search. |
Tool input reference
| Tool | Parameter | Type | Required | Description |
|---|---|---|---|---|
research_search_pubmed | query | string | No (one of three) | Search query, supports PubMed field tags (e.g. "diabetes AND metformin[MeSH Terms]") |
research_search_pubmed | author | string | No | Author name (e.g. "Doudna JA") |
research_search_pubmed | journal | string | No | Journal name (e.g. "Nature", "JAMA", "Lancet") |
research_search_pubmed | date_from / date_to | string | No | YYYY/MM/DD or YYYY |
research_search_pubmed | article_type | enum | No | Review / Clinical Trial / Randomized Controlled Trial / Meta-Analysis / Systematic Review / Case Reports |
research_search_pubmed | sort_by | enum | No | relevance (default) or pub_date |
research_search_pubmed | max_results | number | No | 1 to 500, default 50 |
research_search_semantic_scholar | query | string | Yes | Search query (e.g. "transformer attention mechanism") |
research_search_semantic_scholar | year_from / year_to | number | No | Year range |
research_search_semantic_scholar | venue | string | No | Journal or conference (e.g. "NeurIPS", "Nature") |
research_search_semantic_scholar | field | enum | No | Computer Science / Medicine / Biology / Physics / Chemistry / Mathematics / Engineering / Economics / Psychology / Sociology |
research_search_semantic_scholar | open_access_only | boolean | No | Only papers with free PDF, default false |
research_search_semantic_scholar | min_citations | number | No | Minimum citation count |
research_search_semantic_scholar | sort_by | enum | No | relevance / citationCount / publicationDate |
research_search_arxiv | query | string | No (one of two) | Search query with optional prefixes (e.g. "ti:attention AND au:vaswani") |
research_search_arxiv | category | string | No | ArXiv category (e.g. "cs.AI", "math.CO", "stat.ML") |
research_search_arxiv | sort_by | enum | No | relevance / lastUpdatedDate / submittedDate |
research_search_arxiv | sort_order | enum | No | descending (default) or ascending |
research_search_crossref | query | string | No (one of four) | Full-text search across title and abstract |
research_search_crossref | author | string | No | Author name filter |
research_search_crossref | journal | string | No | Journal or conference name |
research_search_crossref | doi_prefix | string | No | DOI prefix (e.g. "10.1038" for Nature, "10.1126" for Science) |
research_search_crossref | type | enum | No | journal-article / book-chapter / proceedings-article / posted-content / book / dataset / report |
research_search_crossref | year_from / year_to | number | No | Year range |
research_search_crossref | sort_by | enum | No | relevance / is-referenced-by-count / published |
research_search_openalex | query | string | Yes | Search query across title, abstract, and full text |
research_search_openalex | year | number | No | Filter to a single publication year |
research_search_openalex | min_citations | number | No | Minimum citation count |
research_search_openalex | open_access_only | boolean | No | Only open-access papers, default false |
research_search_openalex | sort_by | enum | No | relevance_score:desc / cited_by_count:desc / publication_date:desc |
research_find_researcher | family_name | string | No (one of five) | Last name (e.g. "Hinton", "LeCun") |
research_find_researcher | given_names | string | No | First name(s) |
research_find_researcher | affiliation | string | No | University or organization (e.g. "MIT", "Google DeepMind") |
research_find_researcher | keyword | string | No | Research keyword (e.g. "deep learning", "CRISPR") |
research_find_researcher | query | string | No | Raw ORCID Lucene query (overrides individual fields) |
research_find_researcher | fetch_works | boolean | No | Fetch full publication list per researcher (slower), default false |
research_find_researcher | max_results | number | No | 1 to 100, default 25 |
research_literature_review | query | string | Yes | Research topic or question (e.g. "CAR-T cell therapy efficacy") |
research_literature_review | year_from / year_to | number | No | Year range applied to all databases |
research_literature_review | max_per_source | number | No | 1 to 200, default 50 |
research_literature_review | include_arxiv | boolean | No | Also search ArXiv preprints (adds time due to source rate limit), default false |
Output example
{
"query": "CRISPR off-target effects",
"yearRange": { "from": 2022, "to": "present" },
"coverage": {
"sourcesSearched": ["PubMed", "Semantic Scholar", "Crossref"],
"resultsPerSource": {
"PubMed": 47,
"Semantic Scholar": 50,
"Crossref": 50
},
"totalBeforeDedup": 147,
"uniquePapersWithDoi": 112,
"papersWithoutDoi": 3,
"papersFoundInMultipleSources": 28
},
"papers": [
{
"doi": "10.1038/s41587-023-01918-1",
"title": "Prime editing with genome-wide off-target evaluation",
"authors": "Anzalone AV, Gao XD, Podracky CJ, Nelson AT, Koblan LW, Raguram A, Levy JM, Mercer JAM, Liu DR",
"year": 2023,
"journal": "Nature Biotechnology",
"citationCount": 482,
"abstract": "Prime editing enables precise installation of substitutions, insertions, and deletions without requiring double-strand breaks. Here we develop a high-throughput off-target evaluation pipeline...",
"isOpenAccess": false,
"url": "https://pubmed.ncbi.nlm.nih.gov/37640944/",
"foundIn": ["PubMed", "Semantic Scholar", "Crossref"],
"sourceCount": 3
},
{
"doi": "10.1016/j.cell.2022.10.012",
"title": "Genome-wide specificity profiling of CRISPR-Cas9 base editors in human cells",
"authors": "Kim D, Lim K, Kim S, Yoon S, Kim JS",
"year": 2022,
"journal": "Cell",
"citationCount": 318,
"abstract": "Base editors enable targeted single-nucleotide conversions. We profile genome-wide off-target activity of cytosine and adenine base editors using GUIDE-seq and Digenome-seq...",
"isOpenAccess": true,
"url": "https://www.semanticscholar.org/paper/abc123",
"foundIn": ["PubMed", "Semantic Scholar"],
"sourceCount": 2
},
{
"doi": "10.1126/science.add8643",
"title": "Engineered Cas12a variants with reduced off-target activity",
"authors": "Liu Y, Wang J, Zhang H, Chen L, Doudna JA",
"year": 2023,
"journal": "Science",
"citationCount": 156,
"abstract": "We engineer Cas12a variants through directed evolution to reduce off-target cleavage while preserving on-target efficiency...",
"isOpenAccess": false,
"url": "https://doi.org/10.1126/science.add8643",
"foundIn": ["Crossref"],
"sourceCount": 1
}
],
"papersWithoutDoi": [
{
"title": "Conference talk: CRISPR safety in clinical translation",
"authors": "Chen L, Doudna JA",
"year": 2023,
"journal": "ASH Annual Meeting Abstracts",
"citationCount": 4,
"url": "https://www.semanticscholar.org/paper/xyz789",
"foundIn": ["Semantic Scholar"]
}
]
}
The papers[] array sorts by sourceCount descending (papers in more databases first), with citation count as the tiebreaker. papersWithoutDoi is capped at the first 20 entries so the response stays compact. Single-database tools (research_search_pubmed, research_search_semantic_scholar, etc.) return a simpler { total, source, papers } shape with the source-specific field set.
Output fields
Coverage block (returned by research_literature_review only)
| Field | Type | Description |
|---|---|---|
coverage.sourcesSearched | string[] | Exact list of databases that ran (e.g. ["PubMed", "Semantic Scholar", "Crossref"]) |
coverage.resultsPerSource | object | Per-database hit count, branch on this to detect thin searches |
coverage.totalBeforeDedup | number | Raw count across all sources before DOI dedup |
coverage.uniquePapersWithDoi | number | Deduplicated paper count |
coverage.papersWithoutDoi | number | Count of results that lacked a DOI |
coverage.papersFoundInMultipleSources | number | Corroboration count, the strongest "this is real" signal |
Per-paper fields (composite review)
| Field | Type | Description |
|---|---|---|
doi | string | DOI (normalized, no https://doi.org/ prefix, lowercased) |
title / authors / year / journal | string / string / number / string | Standard bibliographic fields |
citationCount | number | Citation count from the most-detailed source (Semantic Scholar or OpenAlex preferred) |
abstract | string | Full abstract, or Semantic Scholar AI TLDR when only S2 returned the paper |
isOpenAccess | boolean | Open-access flag (null when no source returned it) |
url | string | Best landing URL (pubmedUrl > semanticScholarUrl > crossref url > arxiv absUrl) |
foundIn | string[] | List of databases that returned this paper |
sourceCount | number | Length of foundIn, the rank-sort key |
Single-database tool envelope
| Field | Type | Description |
|---|---|---|
total | number | Number of papers returned (post status-message filter) |
source | string | Database name (e.g. "PubMed", "Semantic Scholar") |
papers | object[] | Raw paper records with source-specific field sets |
How much does it cost to run academic research searches?
Academic Research Intelligence MCP uses pay-per-event pricing: $0.05 per single-database search, $0.15 per multi-database literature review, free for research_list_sources. Platform compute is included.
| Scenario | Tool calls | Cost per call | Total cost |
|---|---|---|---|
| Quick test, single-database lookup | 1 | $0.05 | $0.05 |
| Multi-database literature review (3 sources) | 1 | $0.15 | $0.15 |
| Multi-database review with ArXiv (4 sources) | 1 | $0.15 | $0.15 |
| Systematic review pipeline: review + 2 single-source supplemental | 3 | mixed | $0.25 |
| Researcher discovery + full works fetch | 1 | $0.05 | $0.05 |
| 50 literature reviews per month (active research team) | 50 | $0.15 | $7.50 |
| 500 single-database lookups per month (RAG indexer) | 500 | $0.05 | $25.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached, returning a structured error your pipeline can handle gracefully.
Apify's free tier includes $5 of monthly platform credits: enough for 100 single-database searches or 33 multi-database reviews before you need to add payment.
How it works
- Standby request received. Apify routes the MCP POST to
/mcpon the standby instance. The activity timer resets, so the idle-shutdown countdown restarts. - MCP tool dispatch. The McpServer matches the tool name (
research_literature_review,research_search_pubmed, etc.) and validates input against the Zod schema. Invalid inputs return a structured{ error: ... }response without charging. - PPE charge.
Actor.charge({ eventName })fires before any upstream call (so a failed sub-actor still bills, matching Apify PPE semantics). - Sub-actor call via apify-client. Each tool calls one or more sibling actors (
ryanclinton/pubmed-research-search,ryanclinton/semantic-scholar-search, etc.) withmemory: 256and per-toolwaitSecstimeout (120s default, 180s for S2 / ORCID, 300s for ArXiv). - Literature review fan-out.
research_literature_reviewruns PubMed + Semantic Scholar + Crossref (+ optional ArXiv) in parallel viaPromise.all. Each sub-actor returns a dataset; the MCP iterates items, filters out status-message rows, and normalizes the paper shape. - DOI deduplication. DOIs are normalized (strip
https?://doi.org/prefix, lowercase, trim), then collapsed in aMap<doi, { paper, sources[] }>. Papers without a DOI go to a separatepapersWithoutDoilist so they do not pollute the dedup count. - Multi-source ranking. Deduplicated papers sort by
sources.lengthdescending, thencitationCountdescending. Coverage stats (sourcesSearched,resultsPerSource,papersFoundInMultipleSources) are computed from the raw per-source counts. - Idle shutdown. A 30-second interval checks
Date.now() - lastRequestAt. If the gap exceedsSTANDBY_IDLE_TIMEOUT_SECS(default 300), the actor callsActor.exit()to release platform compute. The next request cold-starts a fresh instance.
Tips for best results
-
Call
research_list_sourcesonce per session. It is free, and the agent gets a current map of which database covers which discipline. Saves a wasted paid call when the user asks for a database that does not match the topic (e.g. ArXiv for biomedical-only research). -
Use
research_literature_reviewfor "what does the literature say" prompts. Three databases for $0.15 beats three single-database calls at $0.15 total when you want dedup and coverage stats. Single-database tools are for when the user names the database. -
Add
include_arxiv: trueonly for STEM topics. ArXiv covers physics, math, CS, q-bio, q-fin, and stats. For biomedical, chemistry-only, or social science queries, ArXiv adds time (1 req per 3s rate limit) without adding coverage. -
Use field tags in PubMed queries for precision.
"BRCA1[Gene Symbol] AND breast cancer[MeSH Terms]"returns far fewer false positives than"BRCA1 breast cancer". The PubMed source supports the full field-tag and boolean syntax. -
Sort Semantic Scholar by
citationCountfor established topics,publicationDatefor emerging ones. Citation-sort surfaces canonical papers in a mature field; date-sort surfaces fresh work in fast-moving areas like LLM research. -
Use ORCID IDs when known, not just names. Common researcher names ("Wei Zhang", "Sarah Kim") return mixed results from multiple people.
research_find_researcherwithfamily_name + affiliationdisambiguates; passing the raw ORCID ID via thequeryparameter is even more precise. -
Set
fetch_works: falsefor researcher discovery,truefor verification. Discovery (finding the right person) only needs profile metadata. Verification (confirming publication record) needs the full works list. Default is false so casual lookups stay fast. -
Tune
STANDBY_IDLE_TIMEOUT_SECSfor traffic pattern. Bursty agent traffic benefits from a longer idle window (600-900s) to avoid cold starts. Always-on workloads can use the 300s default.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| PubMed Research Search | The biomedical sub-actor. Call directly for batch jobs that do not need the MCP overhead, or to pull full datasets for downstream processing. |
| Semantic Scholar Search | The all-discipline sub-actor with AI TLDRs. Use directly when you need citation-sorted results across all fields with paper summaries. |
| ArXiv Paper Search | The preprint sub-actor. Call directly for high-volume preprint indexing where the 1-req-per-3s rate limit needs its own run isolation. |
| Crossref Paper Search | The DOI metadata sub-actor with funder and ORCID data. Use directly for funder-tracking or grant-paper-linkage workflows. |
| OpenAlex Research Search | The broad academic index sub-actor with concept tagging. Use directly for concept-based exploration and institution-level analytics. |
| ORCID Researcher Search | The researcher profile sub-actor. Use directly for batch researcher verification across a candidate list. |
| Research Integrity Screening MCP | The companion integrity tool. Use this MCP for literature discovery, then pipe candidate authors into the integrity MCP for retraction, paper-mill, and citation-anomaly screening. |
| NIH Research Grants | Cross-reference paper authors against NIH PI records to surface funding context for biomedical literature reviews. |
| Company Deep Research | Pair when researching biotech, pharma, or AI companies whose leadership has academic publishing histories. |
Limitations
- No full-text PDF download. This MCP returns metadata, abstracts, and source URLs. To retrieve PDFs, follow the
urlfield per paper or use a separate full-text retrieval tool. - ArXiv rate limit at the source. ArXiv enforces 1 request per 3 seconds. Large
max_resultsvalues onresearch_search_arxivare correspondingly slow (a 300-result query takes ~15 minutes). The sub-actor handles the pacing, but the wait is real. - DOI coverage varies by database. Crossref always has DOIs (it is the DOI registry). Semantic Scholar and OpenAlex have high DOI coverage. PubMed has DOI coverage on most modern records, missing on older citations and some grey literature. ArXiv preprints have DOIs only after the corresponding journal publication.
- Single-database tools return database-native field sets. A PubMed paper has
mesh,articleType,pubmedUrl; a Semantic Scholar paper hastldr,influentialCitationCount. Onlyresearch_literature_reviewnormalizes to a unified shape, single-database calls preserve source-specific fields. - ORCID
fetch_works: trueis slow. Pulling the full publication list adds one upstream call per researcher. For 25 researchers withfetch_works: true, expect 60-180 seconds. - Child sub-actor timeout is 120-300 seconds depending on tool. If a source is slow, it returns an empty array and the composite review still completes with available data.
coverage.resultsPerSourceshows which sources delivered, so a thin result is visible. - Semantic Scholar polite-pool rate limits apply. High-volume callers (1000+ requests per hour) may see throttling at the source. Spread calls across runs or schedule rather than burst.
- OpenAlex indexes content from other sources. OpenAlex pulls from Crossref, PubMed, and others, so a literature review including OpenAlex may surface duplicates that the DOI dedup will collapse, but the raw
resultsPerSourcecount will look inflated.
Integrations
- Apify API, trigger academic searches programmatically from research-knowledge-base builders, literature-monitoring tools, or systematic-review software.
- Webhooks, push new high-citation papers to Slack, email, or knowledge-base ingestion pipelines the moment a scheduled search completes.
- Zapier, connect to Airtable or Google Sheets paper trackers; auto-log new literature-review results when topics are added.
- Make, build research-monitoring workflows that re-run searches weekly and diff new papers against the prior run.
- LangChain / LlamaIndex, embed the MCP as a tool in agent pipelines for automated literature search, RAG ingestion, and research synthesis.
Troubleshooting
Literature review returns coverage.resultsPerSource with one or two sources at 0. One or more sub-actors timed out or returned no matches. Check coverage.sourcesSearched to confirm which databases ran. If a specific source consistently times out, call its single-database tool directly with the same query to isolate the issue.
Tool returns { "error": "Provide at least one of: query, author, or journal" }. The tool requires at least one search field. PubMed, ArXiv, Crossref, and ORCID accept multiple optional fields but at least one must be set. Semantic Scholar and OpenAlex require query.
research_find_researcher returns many unrelated profiles. The name was too common. Add affiliation ("MIT", "Google DeepMind") or pass the ORCID ID via the query parameter for exact match.
ArXiv search is slow. Source rate limit (1 req per 3s). Lower max_results or run the search as a scheduled job. The actor handles the pacing; the wait is at ArXiv, not the MCP.
Cold-start delay on first call. Standby mode shuts the instance down after idle (default 300s) to release platform compute. First request after idle takes ~10-20 seconds to spin up. Subsequent calls in the same window are instant. Increase STANDBY_IDLE_TIMEOUT_SECS if you need longer warm windows.
Tool returns { "error": true, "message": "Spending limit reached" }. Your Apify run has hit the maximum charge limit configured for the run. Increase maxTotalChargeUsd in your run configuration, or purchase additional platform credits.
Responsible use
- All data accessed by this server comes from publicly available academic databases (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) operating under open or polite-pool access policies.
- Citation counts, AI TLDRs, and paper rankings are computed signals, not endorsements of paper quality. Always read the underlying papers before drawing conclusions.
- When using ORCID profile data, comply with applicable data-protection regulations in your jurisdiction (GDPR, CCPA, etc.) when storing or sharing researcher information.
- Do not use this tool to harass, defame, or discriminate against researchers based on publication record alone.
- For guidance on web scraping and data use legality, see Apify's guide.
FAQ
What is the difference between this MCP and calling the six sub-actors directly?
The sub-actors return source-specific paper shapes (different field names, different field sets). This MCP normalizes them into a unified schema, adds DOI-based deduplication via research_literature_review, and exposes everything as MCP tools so AI agents (Claude, Cursor, Windsurf, custom agents) can discover and call them through the standard MCP protocol. If you are running batch jobs from a fixed script, the sub-actors directly may be a better fit; if you are building agent workflows, this MCP is the integration surface.
Why does research_literature_review only query three databases (PubMed, Semantic Scholar, Crossref) by default?
These three give the highest cross-database coverage for most topics with the lowest latency. PubMed covers biomedical, Semantic Scholar covers all disciplines with AI summaries, Crossref covers DOI-registered publications across all fields. Adding ArXiv (set include_arxiv: true) is useful for STEM topics but adds noticeable time due to ArXiv's 1-req-per-3s rate limit. OpenAlex and ORCID are skipped from the composite review because OpenAlex aggregates from Crossref / PubMed (mostly duplicates) and ORCID is researcher-focused, not paper-focused.
How does the DOI deduplication work?
Each paper's DOI is normalized (stripped of https?://doi.org/ prefix, lowercased, trimmed) and used as a map key. When the same DOI appears from multiple sources, the source names are appended to a sources[] array. The final papers[] is sorted by sources.length descending (most-corroborated first), then citationCount descending. Papers without a DOI cannot be deduplicated and go to a separate papersWithoutDoi[] list to keep the dedup count honest.
Do I need any API keys to use this MCP? No. All six data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) are free public academic APIs with no authentication required for the access patterns this MCP uses. You only need an Apify token (for billing) configured in your MCP client.
How long does a typical literature review take? The default three-database review (PubMed + Semantic Scholar + Crossref) typically completes in 60-120 seconds, depending on result counts and source response times. Adding ArXiv extends this to 90-300 seconds for the same query (ArXiv rate limit). Single-database searches typically complete in 20-60 seconds.
Can I get the full PDF of each paper?
Not directly. This MCP returns paper metadata, abstracts (or AI TLDRs from Semantic Scholar), and source URLs. The url field per paper points to the landing page (PubMed, Semantic Scholar, ArXiv abstract, Crossref DOI). For open-access papers, follow the URL to the PDF; for paywalled papers, you will hit the publisher's access wall. A separate full-text retrieval step is required for actual PDF content.
How is this MCP different from web search or Google Scholar? Web search returns ranked pages, not structured paper records. Google Scholar returns paper records but has no API and is unfriendly to programmatic access. This MCP returns clean structured JSON from six official academic data APIs, with cross-database deduplication and explicit coverage stats. For RAG pipelines, agent workflows, and systematic reviews, structured API access is more reliable than scraping search results.
Can I schedule searches to run periodically? Yes. Use the Apify Scheduler to trigger the actor on a daily, weekly, or monthly cadence. Configure a webhook to push new papers (or papers above a citation threshold) to your notification system or knowledge base. This is useful for literature monitoring, RAG corpus refresh, and competitive-intelligence tracking of preprints.
Is it legal to use this tool for academic research? All six underlying data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) are publicly available academic databases that explicitly support programmatic access. Accessing and analyzing public scholarly records is a standard practice in academic research, systematic reviews, and grant management. See Apify's guide on web scraping legality for broader context.
Why does the agent need to call research_list_sources if it is free?
It saves wasted paid calls. The agent can check which databases cover which disciplines (e.g. ArXiv has no biomedical content, PubMed has no CS content) before paying for a search that would return zero from a mismatched source. Especially useful for the first call in a session before the agent has built a mental model of the toolset.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
Related actors
AI Cold Email Writer — $0.01/Email, Zero LLM Markup
Generates personalized cold emails from enriched lead data using your own OpenAI or Anthropic key. Subject line, body, CTA, and optional follow-up sequence — $0.01/email, zero LLM markup.
AI Outreach Personalizer — Emails with Your LLM Key
Generate personalized cold emails using your own OpenAI or Anthropic API key. Subject lines, opening lines, full bodies — tailored to each lead's role, company, and signals. $0.01/lead compute + your LLM costs. Zero AI markup.
Bulk Email Verifier — MX, SMTP & Disposable Detection at Scale
Verify email deliverability in bulk — MX records, SMTP mailbox checks, disposable detection (55K+ domains), role-based flagging, catch-all detection, domain health scoring (SPF/DKIM/DMARC), and confidence scores. $0.005/email, no subscription.
CFPB Complaint Search — By Company, Product & State
Search the CFPB consumer complaint database with 5M+ complaints. Filter by company, product, state, date range, and keyword. Extract complaint details, company responses, and consumer narratives. Free US government data, no API key required.
Ready to try Academic Research Intelligence MCP Server?
This actor is coming soon to the Apify Store.
Coming soon