Academic Research Intelligence MCP Server is an MCP (Model Context Protocol) server on ApifyForge. MCP server for multi-database academic literature search. Wraps 6 specialized actors: PubMed (biomedical), Semantic Scholar (all disciplines with AI summaries), ArXiv (preprints), Crossref (DOI metadata with... It costs $0.05 per search-pubmed. It exposes 7 tools: search-pubmed, search-semantic-scholar, search-arxiv, search-crossref, search-openalex, find-researcher, literature-review. Best for AI developers and agent builders who need structured real-world data inside Claude, Cursor, or other MCP-compatible clients. Not ideal for non-AI workflows or use cases that don't involve an MCP-compatible client. Maintenance pulse: 90/100. Last verified March 27, 2026. Built by Ryan Clinton (ryanclinton on Apify).

AIDEVELOPER TOOLS

Academic Research Intelligence MCP Server

Academic Research Intelligence MCP Server is an MCP (Model Context Protocol) server available on ApifyForge at $0.05 per search-pubmed. MCP server for multi-database academic literature search. Wraps 6 specialized actors: PubMed (biomedical), Semantic Scholar (all disciplines with AI summaries), ArXiv (preprints), Crossref (DOI metadata with citations/funders), OpenAlex (250M+ works), and ORCID (researcher profiles). Includes a unified literature review tool with cross-database deduplication.

Best for AI developers and agent builders who need structured real-world data inside Claude, Cursor, or other MCP-compatible clients.

Not ideal for non-AI workflows or use cases that don't involve an MCP-compatible client.

Coming soon on Apify Store
$0.05per event

Tools exposed

Each pricing event corresponds to a tool your AI agent can call through MCP.

search-pubmedSearch biomedical literature on PubMed. · $0.05/call
search-semantic-scholarSearch academic papers on Semantic Scholar. · $0.05/call
search-arxivSearch preprint papers on ArXiv. · $0.05/call
search-crossrefSearch academic papers via Crossref DOI registry. · $0.05/call
search-openalexSearch research works on OpenAlex. · $0.05/call
find-researcherLook up researcher profiles via ORCID. · $0.05/call
literature-reviewComposite literature review across multiple academic databases. · $0.15/call

Example prompts

Natural language queries you can ask your AI assistant that would trigger this MCP server.

"Run a search pubmed on Acme Corp and summarize the findings"
"Can you search semantic scholar and highlight any red flags?"
"What tools does the Academic Research Intelligence MCP Server have available?"
Last verified: March 27, 2026
90
Actively maintained
Maintenance Pulse
$0.05
Per event

What to know

  • Requires an MCP-compatible client (Claude Desktop, Cursor, Windsurf, or similar).
  • Tool call results depend on the availability of upstream public APIs.
  • Requires an Apify account and API token for authentication.

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

search-pubmeds
Estimated cost:$5.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
search-pubmedSearch biomedical literature on PubMed.$0.05
search-semantic-scholarSearch academic papers on Semantic Scholar.$0.05
search-arxivSearch preprint papers on ArXiv.$0.05
search-crossrefSearch academic papers via Crossref DOI registry.$0.05
search-openalexSearch research works on OpenAlex.$0.05
find-researcherLook up researcher profiles via ORCID.$0.05
literature-reviewComposite literature review across multiple academic databases.$0.15

Example: 100 events = $5.00 · 1,000 events = $50.00

Documentation

Academic Research Intelligence MCP, multi-database academic literature search for AI agents

Academic Research Intelligence MCP is multi-database scholarly search infrastructure for AI agents and research workflows.

It wraps six free academic data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) behind one MCP endpoint and adds a composite literature review tool that queries multiple databases in parallel and deduplicates results by DOI. No API keys required. Built for AI research assistants, systematic-review teams, biomedical analysts, science writers, and any agent that needs a single tool for "find me the papers."

The category

Academic Research Intelligence MCP is multi-database scholarly search infrastructure. Unlike single-database wrappers (which force the agent to pick the right source before searching) or scraping tools (which break when site HTML changes), it exposes six official academic APIs as one MCP toolset and adds DOI-based cross-database deduplication. Agents see one literature surface instead of six, and the composite literature review returns coverage statistics so a thin search is never mistaken for a complete one.

In one sentence

Search PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID through a single MCP server, or run one composite literature review that queries three databases in parallel and deduplicates by DOI.

Category: Academic research MCP. Multi-database literature search. AI agent tooling. Primary use case: Give an AI agent one tool that covers all the major free academic databases. Can also be used for systematic-review compilation, researcher discovery, and DOI cross-referencing.

Also known as: academic research MCP, scholarly search MCP server, multi-database literature search, PubMed MCP, ArXiv MCP, Semantic Scholar MCP, OpenAlex MCP, Crossref MCP, ORCID MCP, literature review agent tool.

What this actor does

  • What it is: A standby-mode MCP server exposing 8 tools that wrap 6 free academic data sources.
  • What it checks: Biomedical literature, all-discipline papers with AI summaries, preprints, DOI metadata, broad academic works, and researcher profiles.
  • What it returns: Structured JSON paper lists with titles, authors, year, journal, DOI, citations, abstracts, plus cross-database coverage stats for the composite literature review tool.
  • What it does NOT do: No full-text PDF download, no peer-review verdict, no plagiarism checking, no integrity scoring (see Research Integrity Screening MCP for that).
  • Who it's for: AI research assistants, systematic-review teams, biomedical analysts, science journalists, R&D analysts, academic recruiters.

What you get from one call

research_literature_review fans out to PubMed, Semantic Scholar, and Crossref (and optionally ArXiv) in parallel and returns:

  • papers[] ranked by source count then citation count (papers found in multiple databases sort first)
  • coverage.sourcesSearched the exact list of databases that ran
  • coverage.resultsPerSource per-database hit counts
  • coverage.totalBeforeDedup raw result count across all sources
  • coverage.uniquePapersWithDoi deduplicated paper count
  • coverage.papersFoundInMultipleSources corroboration count (the strongest "this paper is real and relevant" signal)
  • papersWithoutDoi[] results that lacked a DOI, kept separate so dedup confidence stays clean

Each paper carries doi, title, authors, year, journal, citationCount, abstract (or AI TLDR from Semantic Scholar), isOpenAccess, url, plus foundIn[] listing every database that returned it.

What you also get: 6 free databases, no API keys, DOI deduplication, AI TLDRs from Semantic Scholar, parallel fan-out

What makes this different

  • Six databases, one MCP toolset, zero credentials. PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID all run through one endpoint with no API keys to provision.
  • DOI-based cross-database deduplication. research_literature_review collapses the same paper appearing in three databases into one ranked record with sourceCount, so coverage and corroboration are visible.
  • Coverage-honest. Every literature review returns sourcesSearched and resultsPerSource, so a thin search (one database delivered, two empty) is never mistaken for a complete one.

Before vs after

Without this MCPWith this MCP
Open PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, and ORCID in six tabsOne MCP endpoint, six databases reachable as tools
Provision API keys per service (Semantic Scholar, Crossref polite pool, etc.)No keys required, all sources are free public APIs
Manually deduplicate paper lists by title across databasesDOI-based dedup with multi-source confidence ranking
Write custom retry and rate-limit code per databaseParallel Promise.all fan-out with per-source timeouts
Lose track of which databases were actually checkedcoverage.sourcesSearched and resultsPerSource returned every time

Architecture

agent prompt
   ↓
MCP /mcp endpoint (StreamableHTTP)
   ↓
8 registered tools (6 single-source + literature review + list sources)
   ↓
6 sub-actors called in parallel via apify-client
   ↓
PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID
   ↓
literature review: DOI normalization → dedup map → multi-source ranking
   ↓
structured JSON response with coverage stats

The MCP runs in Apify Standby mode with a configurable idle-shutdown window (default 300s) so platform compute stops billing when no tools are firing.

Built for

AI research assistants embedded in Claude Desktop, Cursor, Windsurf, or LangChain pipelines; systematic-review teams running PubMed-plus-others searches under Cochrane guidelines; biomedical analysts and science journalists who need a single "find me the papers" tool; academic recruiters and tenure committees verifying publication records through ORCID; R&D and competitive-intelligence teams tracking ArXiv preprints and conference proceedings; agent builders who want one academic-search interface instead of six.

This server runs in Standby mode on the Apify platform. It orchestrates six official academic APIs through six sibling Apify actors, normalizes the response shapes, and returns structured JSON your AI agent reasons over directly.

Questions this MCP answers

  • "Find me all papers on CAR-T cell therapy efficacy across PubMed, Semantic Scholar, and Crossref."
  • "What are the latest ArXiv preprints on transformer attention mechanisms?"
  • "Which biomedical papers cite this NIH-funded study?"
  • "Has this researcher published in predatory venues? Show their full publication record from ORCID."
  • "Give me a literature review of CRISPR off-target effects with cross-database coverage stats."
  • "What is the AI TLDR for this Semantic Scholar paper, and how many influential citations does it have?"
  • "Find me Crossref-registered papers from this journal between 2020 and 2024 with funder information."
  • "Which authors at MIT have published on protein folding since 2022?"

For AI agents

  • First tool to reach for when a prompt mentions academic papers, scholarly literature, systematic reviews, citations, DOIs, preprints, or researcher publication records.
  • Call research_list_sources first (it is free). It enumerates the 8 tools, the 6 sources, and the record counts behind each, so the agent can decide which single-database tool to call before paying for a multi-database review.
  • Use research_literature_review when the user wants coverage, not just speed. It runs PubMed + Semantic Scholar + Crossref in parallel and deduplicates by DOI, one call replaces three. Add include_arxiv: true if the topic is physics, math, CS, or stats.
  • Use a single-database tool when the database is named. "Find PubMed papers on..." goes to research_search_pubmed, not the composite review. Single-database calls are $0.05 vs $0.15 for the review.
  • Read coverage.sourcesSearched and papersFoundInMultipleSources before summarizing. A literature review where only one of three databases returned papers is low-corroboration, treat it as preliminary.
  • For researcher lookups, use research_find_researcher. Set fetch_works: true only when the user explicitly asks for the full publication list (it is slower).

Use this MCP when an AI agent needs to:

  • run a literature review across multiple academic databases
  • look up biomedical, AI/ML, physics, or cross-discipline papers
  • find researcher profiles, affiliations, and external IDs
  • pull DOI metadata with funder, ORCID, and licensing information
  • discover ArXiv preprints before journal publication
  • get AI-generated paper summaries and influential citation counts
  • build a corroborated paper list with cross-database confidence scores

What data can you access?

Data PointSourceExample
🧬 Biomedical citations with MeSH terms and abstractsPubMed / MEDLINE36M+ citations, "CRISPR gene editing" returns ~12k papers
📄 All-discipline papers with AI TLDR and influential citationsSemantic Scholar200M+ papers, TLDR field is one-sentence AI summary
📑 Open-access preprints in physics, math, CS, biology, statsArXiv2.4M+ preprints, prefix syntax (ti:, au:, abs:, cat:)
🔗 DOI-registered works with funders and ORCID-linked authorsCrossref150M+ works, funder names + grant numbers + license URLs
📚 Broad academic index with concept tagging and institution dataOpenAlex250M+ works, concept hierarchy + institution affiliations
👤 Researcher profiles with career history and external IDsORCID18M+ profiles, Scopus / ResearcherID linkage, employment history
🧠 AI-generated TLDR (one-sentence paper summary)Semantic Scholar"Transformers replace recurrence with self-attention for sequence modeling"
📈 Influential citation count (citations central to the citing paper)Semantic ScholarinfluentialCitationCount: 1,247 (of 18,392 total citations)
🏛️ Author affiliations and institution employment historyORCID + OpenAlex"Dr. Y. Bengio, Mila / Université de Montréal, 2016-present"
💰 Funder names with grant numbers and licensing URLsCrossref"NIH R01CA123456, CC-BY 4.0"

Why use Academic Research Intelligence MCP?

Most agent-driven academic search is:

  • single-database (the agent calls only the source it knows about, missing cross-database corroboration)
  • credential-heavy (Semantic Scholar polite pool, Crossref polite pool, ORCID token rotation, all separately provisioned)
  • inconsistent in response shape (every database returns a different paper schema)
  • silent on coverage (the agent does not know which databases actually returned data)

This MCP turns that into one tool surface. A single literature-review call queries three databases in parallel, normalizes the paper shapes, deduplicates by DOI, and returns explicit coverage stats your agent acts on directly. Single-database tools stay available when the user names the database.

  • Scheduling: run periodic literature sweeps on Apify Scheduler; pipe new papers to Slack or email via webhooks
  • API access: trigger searches from Python, JavaScript, or any HTTP client using standard MCP protocol
  • Parallel fan-out: literature review queries three databases simultaneously, not sequentially
  • No API keys: all six data sources are free public academic APIs, no credentials to provision
  • Integrations: pipe results into Notion, Airtable, Google Sheets, or any webhook-compatible knowledge base

Features

Multi-database search (8 MCP tools, 6 academic sources)

  • PubMed (36M+ biomedical citations) with field-tag syntax ([Title], [MeSH Terms], [Author]), boolean AND/OR/NOT, article-type filter, date range.
  • Semantic Scholar (200M+ papers) with AI-generated TLDR summaries, influential citation counts, venue and field-of-study filters, citation-sorted results.
  • ArXiv (2.4M+ preprints) with prefix syntax (ti:, au:, abs:, cat:), category filter, submission-date sort. Rate-limited at the source (1 request per 3s).
  • Crossref (150M+ DOI-registered works) with funder names, grant numbers, ORCID author IDs, publication-type filter (journal-article, book-chapter, dataset, etc.).
  • OpenAlex (250M+ works) with concept tagging, institution affiliations, citation-count sort, open-access filter.
  • ORCID (18M+ researchers) with name, affiliation, keyword, or raw Lucene query. Optional fetch_works flag pulls full publication lists.

Composite literature review

  • research_literature_review queries PubMed + Semantic Scholar + Crossref in parallel (and optionally ArXiv).
  • DOI normalization (strips https://doi.org/ prefix, lowercases) before dedup.
  • Multi-source ranking: papers found in more databases sort higher, ties broken by citation count.
  • Coverage stats (sourcesSearched, resultsPerSource, totalBeforeDedup, uniquePapersWithDoi, papersFoundInMultipleSources) returned on every call.
  • Papers without a DOI kept in a separate papersWithoutDoi[] array so the dedup count stays honest.

Operational layer

  • Apify Standby mode with configurable idle-shutdown (default 300s, env var STANDBY_IDLE_TIMEOUT_SECS).
  • Failure-webhook registration on every container start, customer-side failures push to the operator's webhook handler automatically.
  • Per-tool timeouts (120s default, 180s for Semantic Scholar and ORCID, 300s for ArXiv) so a slow source degrades the result instead of blocking it.

Quickstart workflows

Systematic review (Cochrane-style)

topic from user
 → research_literature_review (PubMed + S2 + Crossref + ArXiv)
 → sort papers by sourceCount desc, then citationCount desc
 → for each paper in top 50: include if foundIn.length >= 2
 → export DOIs + titles + abstracts for full-text retrieval

Single-database lookup (named source)

user names a database, e.g. "find me PubMed papers on..."
 → research_search_pubmed with field tags
 → return papers with PubMed-specific fields (MeSH terms, article type, pubmed URL)

Researcher discovery

researcher name from user
 → research_find_researcher (family_name + affiliation)
 → if disambiguation needed: fetch_works=true on the top match
 → return ORCID profile + career history + publication list

Use cases for multi-database academic search

AI research assistants embedded in chat interfaces

Conversational AI assistants embedded in Claude Desktop, Cursor, ChatGPT, or custom agents need a single tool for "find me the papers." Without this MCP, an agent has to pick one database before searching and may miss corroborating results from other sources. The research_literature_review tool queries three databases in parallel, returns a unified DOI-deduplicated paper list with source coverage stats, and lets the agent answer "what does the literature say about X" with cross-database confidence. One MCP call replaces three database lookups plus a manual dedup step.

Systematic-review and meta-analysis teams

Systematic reviewers working under Cochrane or PRISMA guidelines must search at least three databases (typically PubMed plus two others) and document the search strategy per database. research_literature_review returns coverage.sourcesSearched and coverage.resultsPerSource on every call, so the audit trail is produced automatically. DOI-based dedup means the team starts the screening phase with a unique-record list, not a raw union with duplicates. Adding include_arxiv: true extends coverage into preprints, useful for fast-moving fields like AI/ML and bioRxiv-style biology research.

Biomedical analysts and science journalists

Journalists tracking a breaking science story or analysts compiling a clinical-evidence brief need PubMed (peer-reviewed biomedical), Crossref (DOI metadata with funders), and Semantic Scholar (AI TLDRs for fast skimming) in one place. The composite review tool runs all three in parallel, ranks papers by cross-database corroboration, and surfaces highly-cited or open-access work first. The Semantic Scholar TLDR field is particularly useful when triaging dozens of paper titles into a shortlist.

R&D and competitive-intelligence teams tracking preprints

Industry R&D teams in biotech, pharma, AI, and quantum need early signals from preprints before journal publication. research_search_arxiv with sort_by: submittedDate returns the latest preprints in a category (cs.AI, cs.CL, math.OC, q-bio.GN). Pair with research_search_semantic_scholar filtered to venue: ArXiv and sort_by: citationCount to find which preprints are already accumulating citations, the strongest signal that a result will land. Schedule weekly via Apify Scheduler and pipe new high-citation preprints to Slack.

Academic recruiters and tenure committees

Recruiters, provosts, and tenure committees screening candidates need verified publication records, not LinkedIn summaries. research_find_researcher queries ORCID by name and affiliation, and with fetch_works: true returns the full publication list with external IDs (Scopus, ResearcherID) for cross-referencing. Combine with research_search_crossref filtered to the candidate's author name for funder and grant data. The ORCID profile carries employment history, useful for verifying claimed positions and start dates.

Knowledge-base builders for RAG pipelines

Teams building retrieval-augmented generation pipelines for a research domain need a clean, structured paper corpus, not scraped HTML. research_literature_review returns title, authors, year, journal, abstract (or Semantic Scholar TLDR), DOI, citation count, and open-access status as structured JSON ready to chunk and embed. Schedule monthly to keep the corpus fresh. The papersFoundInMultipleSources count is a useful pre-filter for trust-weighted retrieval.

How to connect this academic research MCP

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "academic-research": {
      "url": "https://academic-research-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Cursor, Windsurf, or Cline

Use the same URL and token in your MCP server settings panel. The server communicates via standard MCP protocol over HTTP POST to /mcp.

Python (via requests)

import requests

response = requests.post(
    "https://academic-research-mcp.apify.actor/mcp",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
    },
    json={
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
            "name": "research_literature_review",
            "arguments": {
                "query": "CRISPR off-target effects",
                "year_from": 2022,
                "max_per_source": 50,
                "include_arxiv": False
            }
        },
        "id": 1
    }
)
result = response.json()
review = result["result"]["content"][0]["text"]
print(review)

JavaScript

const response = await fetch(
  "https://academic-research-mcp.apify.actor/mcp",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer YOUR_APIFY_TOKEN"
    },
    body: JSON.stringify({
      jsonrpc: "2.0",
      method: "tools/call",
      params: {
        name: "research_search_semantic_scholar",
        arguments: {
          query: "transformer attention mechanism",
          year_from: 2020,
          min_citations: 50,
          sort_by: "citationCount",
          max_results: 25
        }
      },
      id: 1
    })
  }
);
const data = await response.json();
const review = JSON.parse(data.result.content[0].text);
console.log(`Found ${review.total} papers from ${review.source}`);

cURL

# Run a multi-database literature review
curl -X POST "https://academic-research-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "research_literature_review",
      "arguments": {
        "query": "CAR-T cell therapy efficacy",
        "year_from": 2021,
        "year_to": 2025,
        "max_per_source": 50,
        "include_arxiv": false
      }
    },
    "id": 1
  }'

Environment variables

All six data sources are free public academic APIs and need no key.

VariableRequiredPurpose
STANDBY_IDLE_TIMEOUT_SECSOptionalStandby idle-shutdown window in seconds (default 300, minimum 60). The instance exits after this idle period to release platform compute; the next request cold-starts a fresh one.

MCP tools

Available tools: the full MCP tool catalogue with per-call pricing

ToolPPE eventPriceWhat it returns
research_search_pubmedsearch-pubmed$0.05PubMed biomedical citations with title, authors, journal, MeSH terms, abstract, pubmed URL. Field tags ([Title], [MeSH Terms], [Author]), boolean AND/OR/NOT, article-type filter, date range, max 500.
research_search_semantic_scholarsearch-semantic-scholar$0.05Semantic Scholar papers with AI-generated TLDR, influentialCitationCount, venue, field-of-study, open-access PDF link. Year range, venue, field, min_citations, sort by citationCount or publicationDate, max 500.
research_search_arxivsearch-arxiv$0.05ArXiv preprints with title, authors, abstract, category, PDF URL. Prefix syntax (ti:, au:, abs:, cat:), category filter (cs.AI, math.CO, stat.ML, physics.hep-th, etc.), sort by relevance or date, max 500. Rate-limited at source (1 req per 3s).
research_search_crossrefsearch-crossref$0.05Crossref DOI-registered works with funders, grant numbers, ORCID author IDs, licensing, publication type. Filter by query / author / journal / DOI prefix / type, year range, sort by relevance / citation count / publication date, max 500.
research_search_openalexsearch-openalex$0.05OpenAlex works with concept tagging, institution affiliations, citation counts, open-access status. Year filter, min_citations, sort by relevance / cited_by_count / publication_date, max 500.
research_find_researcherfind-researcher$0.05ORCID researcher profiles with career history, employment, external IDs (Scopus, ResearcherID). Family name, given names, affiliation, keyword, or raw Lucene query. Optional fetch_works: true pulls full publication list. Max 100.
research_literature_reviewliterature-review$0.15Composite review: queries PubMed + Semantic Scholar + Crossref in parallel (+ optional ArXiv), deduplicates by DOI, ranks by source count then citation count. Returns papers[] with foundIn[] and sourceCount, plus coverage block (sourcesSearched, resultsPerSource, totalBeforeDedup, uniquePapersWithDoi, papersFoundInMultipleSources).
research_list_sources(none, free)FreeEnumerates the 8 tools, 6 sources, and record counts. No upstream fetch, no charge. Useful for agent planning before paying for a search.

Tool input reference

ToolParameterTypeRequiredDescription
research_search_pubmedquerystringNo (one of three)Search query, supports PubMed field tags (e.g. "diabetes AND metformin[MeSH Terms]")
research_search_pubmedauthorstringNoAuthor name (e.g. "Doudna JA")
research_search_pubmedjournalstringNoJournal name (e.g. "Nature", "JAMA", "Lancet")
research_search_pubmeddate_from / date_tostringNoYYYY/MM/DD or YYYY
research_search_pubmedarticle_typeenumNoReview / Clinical Trial / Randomized Controlled Trial / Meta-Analysis / Systematic Review / Case Reports
research_search_pubmedsort_byenumNorelevance (default) or pub_date
research_search_pubmedmax_resultsnumberNo1 to 500, default 50
research_search_semantic_scholarquerystringYesSearch query (e.g. "transformer attention mechanism")
research_search_semantic_scholaryear_from / year_tonumberNoYear range
research_search_semantic_scholarvenuestringNoJournal or conference (e.g. "NeurIPS", "Nature")
research_search_semantic_scholarfieldenumNoComputer Science / Medicine / Biology / Physics / Chemistry / Mathematics / Engineering / Economics / Psychology / Sociology
research_search_semantic_scholaropen_access_onlybooleanNoOnly papers with free PDF, default false
research_search_semantic_scholarmin_citationsnumberNoMinimum citation count
research_search_semantic_scholarsort_byenumNorelevance / citationCount / publicationDate
research_search_arxivquerystringNo (one of two)Search query with optional prefixes (e.g. "ti:attention AND au:vaswani")
research_search_arxivcategorystringNoArXiv category (e.g. "cs.AI", "math.CO", "stat.ML")
research_search_arxivsort_byenumNorelevance / lastUpdatedDate / submittedDate
research_search_arxivsort_orderenumNodescending (default) or ascending
research_search_crossrefquerystringNo (one of four)Full-text search across title and abstract
research_search_crossrefauthorstringNoAuthor name filter
research_search_crossrefjournalstringNoJournal or conference name
research_search_crossrefdoi_prefixstringNoDOI prefix (e.g. "10.1038" for Nature, "10.1126" for Science)
research_search_crossreftypeenumNojournal-article / book-chapter / proceedings-article / posted-content / book / dataset / report
research_search_crossrefyear_from / year_tonumberNoYear range
research_search_crossrefsort_byenumNorelevance / is-referenced-by-count / published
research_search_openalexquerystringYesSearch query across title, abstract, and full text
research_search_openalexyearnumberNoFilter to a single publication year
research_search_openalexmin_citationsnumberNoMinimum citation count
research_search_openalexopen_access_onlybooleanNoOnly open-access papers, default false
research_search_openalexsort_byenumNorelevance_score:desc / cited_by_count:desc / publication_date:desc
research_find_researcherfamily_namestringNo (one of five)Last name (e.g. "Hinton", "LeCun")
research_find_researchergiven_namesstringNoFirst name(s)
research_find_researcheraffiliationstringNoUniversity or organization (e.g. "MIT", "Google DeepMind")
research_find_researcherkeywordstringNoResearch keyword (e.g. "deep learning", "CRISPR")
research_find_researcherquerystringNoRaw ORCID Lucene query (overrides individual fields)
research_find_researcherfetch_worksbooleanNoFetch full publication list per researcher (slower), default false
research_find_researchermax_resultsnumberNo1 to 100, default 25
research_literature_reviewquerystringYesResearch topic or question (e.g. "CAR-T cell therapy efficacy")
research_literature_reviewyear_from / year_tonumberNoYear range applied to all databases
research_literature_reviewmax_per_sourcenumberNo1 to 200, default 50
research_literature_reviewinclude_arxivbooleanNoAlso search ArXiv preprints (adds time due to source rate limit), default false

Output example

{
  "query": "CRISPR off-target effects",
  "yearRange": { "from": 2022, "to": "present" },
  "coverage": {
    "sourcesSearched": ["PubMed", "Semantic Scholar", "Crossref"],
    "resultsPerSource": {
      "PubMed": 47,
      "Semantic Scholar": 50,
      "Crossref": 50
    },
    "totalBeforeDedup": 147,
    "uniquePapersWithDoi": 112,
    "papersWithoutDoi": 3,
    "papersFoundInMultipleSources": 28
  },
  "papers": [
    {
      "doi": "10.1038/s41587-023-01918-1",
      "title": "Prime editing with genome-wide off-target evaluation",
      "authors": "Anzalone AV, Gao XD, Podracky CJ, Nelson AT, Koblan LW, Raguram A, Levy JM, Mercer JAM, Liu DR",
      "year": 2023,
      "journal": "Nature Biotechnology",
      "citationCount": 482,
      "abstract": "Prime editing enables precise installation of substitutions, insertions, and deletions without requiring double-strand breaks. Here we develop a high-throughput off-target evaluation pipeline...",
      "isOpenAccess": false,
      "url": "https://pubmed.ncbi.nlm.nih.gov/37640944/",
      "foundIn": ["PubMed", "Semantic Scholar", "Crossref"],
      "sourceCount": 3
    },
    {
      "doi": "10.1016/j.cell.2022.10.012",
      "title": "Genome-wide specificity profiling of CRISPR-Cas9 base editors in human cells",
      "authors": "Kim D, Lim K, Kim S, Yoon S, Kim JS",
      "year": 2022,
      "journal": "Cell",
      "citationCount": 318,
      "abstract": "Base editors enable targeted single-nucleotide conversions. We profile genome-wide off-target activity of cytosine and adenine base editors using GUIDE-seq and Digenome-seq...",
      "isOpenAccess": true,
      "url": "https://www.semanticscholar.org/paper/abc123",
      "foundIn": ["PubMed", "Semantic Scholar"],
      "sourceCount": 2
    },
    {
      "doi": "10.1126/science.add8643",
      "title": "Engineered Cas12a variants with reduced off-target activity",
      "authors": "Liu Y, Wang J, Zhang H, Chen L, Doudna JA",
      "year": 2023,
      "journal": "Science",
      "citationCount": 156,
      "abstract": "We engineer Cas12a variants through directed evolution to reduce off-target cleavage while preserving on-target efficiency...",
      "isOpenAccess": false,
      "url": "https://doi.org/10.1126/science.add8643",
      "foundIn": ["Crossref"],
      "sourceCount": 1
    }
  ],
  "papersWithoutDoi": [
    {
      "title": "Conference talk: CRISPR safety in clinical translation",
      "authors": "Chen L, Doudna JA",
      "year": 2023,
      "journal": "ASH Annual Meeting Abstracts",
      "citationCount": 4,
      "url": "https://www.semanticscholar.org/paper/xyz789",
      "foundIn": ["Semantic Scholar"]
    }
  ]
}

The papers[] array sorts by sourceCount descending (papers in more databases first), with citation count as the tiebreaker. papersWithoutDoi is capped at the first 20 entries so the response stays compact. Single-database tools (research_search_pubmed, research_search_semantic_scholar, etc.) return a simpler { total, source, papers } shape with the source-specific field set.

Output fields

Coverage block (returned by research_literature_review only)

FieldTypeDescription
coverage.sourcesSearchedstring[]Exact list of databases that ran (e.g. ["PubMed", "Semantic Scholar", "Crossref"])
coverage.resultsPerSourceobjectPer-database hit count, branch on this to detect thin searches
coverage.totalBeforeDedupnumberRaw count across all sources before DOI dedup
coverage.uniquePapersWithDoinumberDeduplicated paper count
coverage.papersWithoutDoinumberCount of results that lacked a DOI
coverage.papersFoundInMultipleSourcesnumberCorroboration count, the strongest "this is real" signal

Per-paper fields (composite review)

FieldTypeDescription
doistringDOI (normalized, no https://doi.org/ prefix, lowercased)
title / authors / year / journalstring / string / number / stringStandard bibliographic fields
citationCountnumberCitation count from the most-detailed source (Semantic Scholar or OpenAlex preferred)
abstractstringFull abstract, or Semantic Scholar AI TLDR when only S2 returned the paper
isOpenAccessbooleanOpen-access flag (null when no source returned it)
urlstringBest landing URL (pubmedUrl > semanticScholarUrl > crossref url > arxiv absUrl)
foundInstring[]List of databases that returned this paper
sourceCountnumberLength of foundIn, the rank-sort key

Single-database tool envelope

FieldTypeDescription
totalnumberNumber of papers returned (post status-message filter)
sourcestringDatabase name (e.g. "PubMed", "Semantic Scholar")
papersobject[]Raw paper records with source-specific field sets

How much does it cost to run academic research searches?

Academic Research Intelligence MCP uses pay-per-event pricing: $0.05 per single-database search, $0.15 per multi-database literature review, free for research_list_sources. Platform compute is included.

ScenarioTool callsCost per callTotal cost
Quick test, single-database lookup1$0.05$0.05
Multi-database literature review (3 sources)1$0.15$0.15
Multi-database review with ArXiv (4 sources)1$0.15$0.15
Systematic review pipeline: review + 2 single-source supplemental3mixed$0.25
Researcher discovery + full works fetch1$0.05$0.05
50 literature reviews per month (active research team)50$0.15$7.50
500 single-database lookups per month (RAG indexer)500$0.05$25.00

You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached, returning a structured error your pipeline can handle gracefully.

Apify's free tier includes $5 of monthly platform credits: enough for 100 single-database searches or 33 multi-database reviews before you need to add payment.

How it works

  1. Standby request received. Apify routes the MCP POST to /mcp on the standby instance. The activity timer resets, so the idle-shutdown countdown restarts.
  2. MCP tool dispatch. The McpServer matches the tool name (research_literature_review, research_search_pubmed, etc.) and validates input against the Zod schema. Invalid inputs return a structured { error: ... } response without charging.
  3. PPE charge. Actor.charge({ eventName }) fires before any upstream call (so a failed sub-actor still bills, matching Apify PPE semantics).
  4. Sub-actor call via apify-client. Each tool calls one or more sibling actors (ryanclinton/pubmed-research-search, ryanclinton/semantic-scholar-search, etc.) with memory: 256 and per-tool waitSecs timeout (120s default, 180s for S2 / ORCID, 300s for ArXiv).
  5. Literature review fan-out. research_literature_review runs PubMed + Semantic Scholar + Crossref (+ optional ArXiv) in parallel via Promise.all. Each sub-actor returns a dataset; the MCP iterates items, filters out status-message rows, and normalizes the paper shape.
  6. DOI deduplication. DOIs are normalized (strip https?://doi.org/ prefix, lowercase, trim), then collapsed in a Map<doi, { paper, sources[] }>. Papers without a DOI go to a separate papersWithoutDoi list so they do not pollute the dedup count.
  7. Multi-source ranking. Deduplicated papers sort by sources.length descending, then citationCount descending. Coverage stats (sourcesSearched, resultsPerSource, papersFoundInMultipleSources) are computed from the raw per-source counts.
  8. Idle shutdown. A 30-second interval checks Date.now() - lastRequestAt. If the gap exceeds STANDBY_IDLE_TIMEOUT_SECS (default 300), the actor calls Actor.exit() to release platform compute. The next request cold-starts a fresh instance.

Tips for best results

  1. Call research_list_sources once per session. It is free, and the agent gets a current map of which database covers which discipline. Saves a wasted paid call when the user asks for a database that does not match the topic (e.g. ArXiv for biomedical-only research).

  2. Use research_literature_review for "what does the literature say" prompts. Three databases for $0.15 beats three single-database calls at $0.15 total when you want dedup and coverage stats. Single-database tools are for when the user names the database.

  3. Add include_arxiv: true only for STEM topics. ArXiv covers physics, math, CS, q-bio, q-fin, and stats. For biomedical, chemistry-only, or social science queries, ArXiv adds time (1 req per 3s rate limit) without adding coverage.

  4. Use field tags in PubMed queries for precision. "BRCA1[Gene Symbol] AND breast cancer[MeSH Terms]" returns far fewer false positives than "BRCA1 breast cancer". The PubMed source supports the full field-tag and boolean syntax.

  5. Sort Semantic Scholar by citationCount for established topics, publicationDate for emerging ones. Citation-sort surfaces canonical papers in a mature field; date-sort surfaces fresh work in fast-moving areas like LLM research.

  6. Use ORCID IDs when known, not just names. Common researcher names ("Wei Zhang", "Sarah Kim") return mixed results from multiple people. research_find_researcher with family_name + affiliation disambiguates; passing the raw ORCID ID via the query parameter is even more precise.

  7. Set fetch_works: false for researcher discovery, true for verification. Discovery (finding the right person) only needs profile metadata. Verification (confirming publication record) needs the full works list. Default is false so casual lookups stay fast.

  8. Tune STANDBY_IDLE_TIMEOUT_SECS for traffic pattern. Bursty agent traffic benefits from a longer idle window (600-900s) to avoid cold starts. Always-on workloads can use the 300s default.

Combine with other Apify actors

ActorHow to combine
PubMed Research SearchThe biomedical sub-actor. Call directly for batch jobs that do not need the MCP overhead, or to pull full datasets for downstream processing.
Semantic Scholar SearchThe all-discipline sub-actor with AI TLDRs. Use directly when you need citation-sorted results across all fields with paper summaries.
ArXiv Paper SearchThe preprint sub-actor. Call directly for high-volume preprint indexing where the 1-req-per-3s rate limit needs its own run isolation.
Crossref Paper SearchThe DOI metadata sub-actor with funder and ORCID data. Use directly for funder-tracking or grant-paper-linkage workflows.
OpenAlex Research SearchThe broad academic index sub-actor with concept tagging. Use directly for concept-based exploration and institution-level analytics.
ORCID Researcher SearchThe researcher profile sub-actor. Use directly for batch researcher verification across a candidate list.
Research Integrity Screening MCPThe companion integrity tool. Use this MCP for literature discovery, then pipe candidate authors into the integrity MCP for retraction, paper-mill, and citation-anomaly screening.
NIH Research GrantsCross-reference paper authors against NIH PI records to surface funding context for biomedical literature reviews.
Company Deep ResearchPair when researching biotech, pharma, or AI companies whose leadership has academic publishing histories.

Limitations

  • No full-text PDF download. This MCP returns metadata, abstracts, and source URLs. To retrieve PDFs, follow the url field per paper or use a separate full-text retrieval tool.
  • ArXiv rate limit at the source. ArXiv enforces 1 request per 3 seconds. Large max_results values on research_search_arxiv are correspondingly slow (a 300-result query takes ~15 minutes). The sub-actor handles the pacing, but the wait is real.
  • DOI coverage varies by database. Crossref always has DOIs (it is the DOI registry). Semantic Scholar and OpenAlex have high DOI coverage. PubMed has DOI coverage on most modern records, missing on older citations and some grey literature. ArXiv preprints have DOIs only after the corresponding journal publication.
  • Single-database tools return database-native field sets. A PubMed paper has mesh, articleType, pubmedUrl; a Semantic Scholar paper has tldr, influentialCitationCount. Only research_literature_review normalizes to a unified shape, single-database calls preserve source-specific fields.
  • ORCID fetch_works: true is slow. Pulling the full publication list adds one upstream call per researcher. For 25 researchers with fetch_works: true, expect 60-180 seconds.
  • Child sub-actor timeout is 120-300 seconds depending on tool. If a source is slow, it returns an empty array and the composite review still completes with available data. coverage.resultsPerSource shows which sources delivered, so a thin result is visible.
  • Semantic Scholar polite-pool rate limits apply. High-volume callers (1000+ requests per hour) may see throttling at the source. Spread calls across runs or schedule rather than burst.
  • OpenAlex indexes content from other sources. OpenAlex pulls from Crossref, PubMed, and others, so a literature review including OpenAlex may surface duplicates that the DOI dedup will collapse, but the raw resultsPerSource count will look inflated.

Integrations

  • Apify API, trigger academic searches programmatically from research-knowledge-base builders, literature-monitoring tools, or systematic-review software.
  • Webhooks, push new high-citation papers to Slack, email, or knowledge-base ingestion pipelines the moment a scheduled search completes.
  • Zapier, connect to Airtable or Google Sheets paper trackers; auto-log new literature-review results when topics are added.
  • Make, build research-monitoring workflows that re-run searches weekly and diff new papers against the prior run.
  • LangChain / LlamaIndex, embed the MCP as a tool in agent pipelines for automated literature search, RAG ingestion, and research synthesis.

Troubleshooting

Literature review returns coverage.resultsPerSource with one or two sources at 0. One or more sub-actors timed out or returned no matches. Check coverage.sourcesSearched to confirm which databases ran. If a specific source consistently times out, call its single-database tool directly with the same query to isolate the issue.

Tool returns { "error": "Provide at least one of: query, author, or journal" }. The tool requires at least one search field. PubMed, ArXiv, Crossref, and ORCID accept multiple optional fields but at least one must be set. Semantic Scholar and OpenAlex require query.

research_find_researcher returns many unrelated profiles. The name was too common. Add affiliation ("MIT", "Google DeepMind") or pass the ORCID ID via the query parameter for exact match.

ArXiv search is slow. Source rate limit (1 req per 3s). Lower max_results or run the search as a scheduled job. The actor handles the pacing; the wait is at ArXiv, not the MCP.

Cold-start delay on first call. Standby mode shuts the instance down after idle (default 300s) to release platform compute. First request after idle takes ~10-20 seconds to spin up. Subsequent calls in the same window are instant. Increase STANDBY_IDLE_TIMEOUT_SECS if you need longer warm windows.

Tool returns { "error": true, "message": "Spending limit reached" }. Your Apify run has hit the maximum charge limit configured for the run. Increase maxTotalChargeUsd in your run configuration, or purchase additional platform credits.

Responsible use

  • All data accessed by this server comes from publicly available academic databases (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) operating under open or polite-pool access policies.
  • Citation counts, AI TLDRs, and paper rankings are computed signals, not endorsements of paper quality. Always read the underlying papers before drawing conclusions.
  • When using ORCID profile data, comply with applicable data-protection regulations in your jurisdiction (GDPR, CCPA, etc.) when storing or sharing researcher information.
  • Do not use this tool to harass, defame, or discriminate against researchers based on publication record alone.
  • For guidance on web scraping and data use legality, see Apify's guide.

FAQ

What is the difference between this MCP and calling the six sub-actors directly? The sub-actors return source-specific paper shapes (different field names, different field sets). This MCP normalizes them into a unified schema, adds DOI-based deduplication via research_literature_review, and exposes everything as MCP tools so AI agents (Claude, Cursor, Windsurf, custom agents) can discover and call them through the standard MCP protocol. If you are running batch jobs from a fixed script, the sub-actors directly may be a better fit; if you are building agent workflows, this MCP is the integration surface.

Why does research_literature_review only query three databases (PubMed, Semantic Scholar, Crossref) by default? These three give the highest cross-database coverage for most topics with the lowest latency. PubMed covers biomedical, Semantic Scholar covers all disciplines with AI summaries, Crossref covers DOI-registered publications across all fields. Adding ArXiv (set include_arxiv: true) is useful for STEM topics but adds noticeable time due to ArXiv's 1-req-per-3s rate limit. OpenAlex and ORCID are skipped from the composite review because OpenAlex aggregates from Crossref / PubMed (mostly duplicates) and ORCID is researcher-focused, not paper-focused.

How does the DOI deduplication work? Each paper's DOI is normalized (stripped of https?://doi.org/ prefix, lowercased, trimmed) and used as a map key. When the same DOI appears from multiple sources, the source names are appended to a sources[] array. The final papers[] is sorted by sources.length descending (most-corroborated first), then citationCount descending. Papers without a DOI cannot be deduplicated and go to a separate papersWithoutDoi[] list to keep the dedup count honest.

Do I need any API keys to use this MCP? No. All six data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) are free public academic APIs with no authentication required for the access patterns this MCP uses. You only need an Apify token (for billing) configured in your MCP client.

How long does a typical literature review take? The default three-database review (PubMed + Semantic Scholar + Crossref) typically completes in 60-120 seconds, depending on result counts and source response times. Adding ArXiv extends this to 90-300 seconds for the same query (ArXiv rate limit). Single-database searches typically complete in 20-60 seconds.

Can I get the full PDF of each paper? Not directly. This MCP returns paper metadata, abstracts (or AI TLDRs from Semantic Scholar), and source URLs. The url field per paper points to the landing page (PubMed, Semantic Scholar, ArXiv abstract, Crossref DOI). For open-access papers, follow the URL to the PDF; for paywalled papers, you will hit the publisher's access wall. A separate full-text retrieval step is required for actual PDF content.

How is this MCP different from web search or Google Scholar? Web search returns ranked pages, not structured paper records. Google Scholar returns paper records but has no API and is unfriendly to programmatic access. This MCP returns clean structured JSON from six official academic data APIs, with cross-database deduplication and explicit coverage stats. For RAG pipelines, agent workflows, and systematic reviews, structured API access is more reliable than scraping search results.

Can I schedule searches to run periodically? Yes. Use the Apify Scheduler to trigger the actor on a daily, weekly, or monthly cadence. Configure a webhook to push new papers (or papers above a citation threshold) to your notification system or knowledge base. This is useful for literature monitoring, RAG corpus refresh, and competitive-intelligence tracking of preprints.

Is it legal to use this tool for academic research? All six underlying data sources (PubMed, Semantic Scholar, ArXiv, Crossref, OpenAlex, ORCID) are publicly available academic databases that explicitly support programmatic access. Accessing and analyzing public scholarly records is a standard practice in academic research, systematic reviews, and grant management. See Apify's guide on web scraping legality for broader context.

Why does the agent need to call research_list_sources if it is free? It saves wasted paid calls. The agent can check which databases cover which disciplines (e.g. ArXiv has no biomedical content, PubMed has no CS content) before paying for a search that would return zero from a mismatched source. Especially useful for the first call in a session before the agent has built a mental model of the toolset.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

Last verified: March 27, 2026

Ready to try Academic Research Intelligence MCP Server?

This actor is coming soon to the Apify Store.

Coming soon