AIDEVELOPER TOOLS

Research Integrity Screening MCP Server

Research integrity screening MCP that connects Claude, Cursor, and other AI agents to academic fraud detection across 7 live data sources. Screen researchers, detect paper mill output, flag citation manipulation using Benford's law analysis, assess journal quality, and audit NIH grant-publication linkages — all in a single tool call from your AI agent. Returns a composite **Integrity Score (0-100)** with a CLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK verdict.

Try on Apify Store
$0.15per event
1
Users (30d)
13
Runs (30d)
90
Actively maintained
Maintenance Pulse
$0.15
Per event

Maintenance Pulse

90/100
Last Build
Today
Last Version
1d ago
Builds (30d)
8
Issue Response
N/A

Cost Estimate

How many results do you need?

screen_researcher_integritys
Estimated cost:$15.00

Pricing

Pay Per Event model. You only pay for what you use.

EventDescriptionPrice
screen_researcher_integrityOpenAlex + ORCID + PubMed + Semantic Scholar integrity check.$0.15
check_publication_flagsPaper mill detection + template patterns.$0.10
assess_journal_qualityCitation impact + open access + source diversity.$0.10
detect_citation_anomaliesBenford's law + citation distribution analysis.$0.08
audit_grant_research_linkNIH grants + publication linkage + funding risk.$0.12
compare_institutional_integritySide-by-side quality + funding comparison.$0.20
generate_integrity_reportAll 7 sources, 4 scoring models, CLEAR/HIGH_RISK verdict.$0.35

Example: 100 events = $15.00 · 1,000 events = $150.00

Connect to your AI agent

Add this MCP server to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

MCP Endpoint
https://ryanclinton--research-integrity-screening-mcp.apify.actor/mcp
Claude Desktop Config
{
  "mcpServers": {
    "research-integrity-screening-mcp": {
      "url": "https://ryanclinton--research-integrity-screening-mcp.apify.actor/mcp"
    }
  }
}

Documentation

Research integrity screening MCP that connects Claude, Cursor, and other AI agents to academic fraud detection across 7 live data sources. Screen researchers, detect paper mill output, flag citation manipulation using Benford's law analysis, assess journal quality, and audit NIH grant-publication linkages — all in a single tool call from your AI agent. Returns a composite Integrity Score (0-100) with a CLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK verdict.

This server runs in Standby mode on the Apify platform, responding to MCP requests without cold-start delays. It orchestrates OpenAlex, ORCID, PubMed, Semantic Scholar, Crossref, CORE, and NIH Research Grants in parallel, applies four independent scoring models, and returns structured JSON that your AI agent can reason over directly. Grant reviewers, journal editors, and research integrity officers get consistent, reproducible scores rather than ad hoc judgement calls.

What data can you access?

Data PointSourceExample
📄 Publication metadata, citation counts, DOIsOpenAlex247 papers, avg 18.3 citations
👤 Researcher profiles, affiliations, employment historyORCIDDr. M. Petrov, MIT 2018-present
🔬 Biomedical literature, MeSH terms, abstractsPubMed"Expression of Concern: oncology study"
📊 AI citation analysis, influence scores, paper embeddingsSemantic ScholarInfluence score 94, 12 highly-cited papers
🔗 DOI metadata, reference lists, journal metadataCrossref10.1016/j.cell.2023.04.021
📂 Open access full-text repository coverageCORE61% OA ratio across publication set
💰 Federal grant awards, PI names, funding amountsNIH GrantsR01CA123456, $1.2M, University of Chicago
🚩 Retraction / correction / expression of concern flagsOpenAlex + PubMed3 retraction flags, 2 corrections detected
📈 Publication velocity by year, year-over-year spike detectionOpenAlex + PubMed47 papers in 2022 — velocity spike flagged
🏦 Funding concentration index (HHI), terminated grant flagsNIH GrantsHHI 0.82 — single-source dependency risk

Why use Research Integrity Screening MCP?

Manual research integrity review is slow and inconsistent. Checking a single researcher across OpenAlex, ORCID, PubMed, Semantic Scholar, Crossref, and NIH Grants separately takes 2-3 hours per subject. Applying Benford's law to citation distributions requires spreadsheet work most reviewers skip entirely. Paper mill template detection across dozens of papers is impractical without automation. And the results of manual review are rarely comparable across screeners or repeatable over time.

This MCP automates the entire workflow. A single tool call queries all seven sources in parallel, applies four scoring algorithms, and returns a structured verdict in under 2 minutes. The MCP format means your AI agent calls these tools mid-conversation — ask Claude to screen a grant applicant and it invokes the tool, interprets the score, and explains the findings without you opening a separate application.

  • Scheduling — run periodic integrity sweeps on Apify Scheduler; flag new anomalies automatically
  • API access — trigger screenings from Python, JavaScript, or any HTTP client using standard MCP protocol
  • Parallel data fetching — all seven data sources queried simultaneously, not sequentially
  • Monitoring — receive Slack or email alerts when HIGH_RISK verdicts are returned via Apify webhooks
  • Integrations — pipe results into Notion, Airtable, or any webhook-compatible grant management system

Features

  • Benford's law citation analysis — computes leading-digit frequency distribution across a researcher's full citation set and flags deviation from the expected logarithmic distribution (digit 1 expected at 30.1%)
  • Coefficient of variation check — detects suspiciously uniform citation distributions where CV < 0.3 across 10+ papers, a statistical proxy for citation ring or self-citation manipulation
  • Paper mill template detection — extracts the first 5-word prefix of each paper title and flags patterns that repeat 3 or more times across the publication set
  • Journal concentration scoring — identifies when more than 50% of a researcher's papers appear in a single journal, a known paper mill indicator
  • Author diversity analysis — computes the unique author-set ratio across all papers; low diversity below 30% with 10+ papers triggers a flag
  • Publication velocity monitoring — flags any calendar year with more than 30 publications, and detects year-over-year spikes of 3x or greater with at least 10 papers
  • ORCID verification scoring — penalises missing profiles, empty works lists, and absent affiliation records as identity-unverified risk signals
  • Retraction and correction detection — scans publication titles and document types for "retract", "correction", "erratum", and "expression of concern" keywords across OpenAlex, PubMed, and Semantic Scholar
  • NIH grant-to-paper ratio — computes publications-per-grant ratio; ratios above 20:1 flag potential output padding; ratios below 1:1 flag low productivity
  • Funding concentration HHI — applies the Herfindahl-Hirschman Index to funding sources; concentration above 0.7 with 3+ grants signals single-source dependency
  • Terminated grant detection — scans NIH grant records for "terminated", "withdrawn", and "suspended" status text
  • Four independent scoring models — Researcher Integrity (max 100), Paper Mill (max 100), Journal Quality (max 100, positive scale), Funding Risk (max 100) each produce standalone scores
  • Weighted composite score — combines all four models: Integrity 30% + Paper Mill 25% + (100 minus Journal Quality) 25% + Funding Risk 20%
  • Five-tier verdicts per model — each sub-model uses domain-appropriate labels (CLEAN through CRITICAL for integrity; UNLIKELY through CONFIRMED_MILL for paper mills; PREDATORY through ELITE for journals)
  • Hard override logic — CRITICAL integrity level or CONFIRMED_MILL verdict forces HIGH_RISK regardless of composite score
  • Deterministic required actions — the requiredActions list is generated from specific threshold triggers, not the composite score, ensuring concrete next steps even when the overall score is borderline

Use cases for research integrity screening

Pre-award grant screening

Grant programme officers at federal agencies and private foundations need to vet principal investigators before committing funds. A single screen_researcher_integrity call cross-references the applicant's publication record across four databases, applies Benford's law to their citation history, checks for retraction history, and verifies their ORCID profile. Officers get a scored, reproducible result they can attach to the application file, replacing hours of manual database lookups with a 90-second workflow.

Journal submission integrity review

Peer review coordinators can run check_publication_flags against a submitted manuscript's author list or topic before assigning reviewers. The paper mill detection model checks for repeated title templates, journal over-concentration in the author's history, and author-group uniformity — the three most reliable early indicators of paper mill output. A PROBABLE or higher mill score routes the submission to an integrity editor rather than standard peer review.

Faculty hiring due diligence

Provosts and department chairs screening candidates can run generate_integrity_report to receive a full composite view before making offers. The tool verifies ORCID identity, assesses publication velocity for implausible output rates, checks for retraction history, and evaluates whether the candidate's journal choices reflect credible venues. This takes 90 seconds rather than three days of reference checking.

Research institution partnership assessment

Before formalising a collaboration, compliance teams can run compare_institutional_integrity to benchmark two institutions side-by-side on journal quality and funding risk. The tool queries OpenAlex and NIH Grants for both entities simultaneously and returns a structured comparison with a quality advantage indicator — useful for partnership decision memos.

Funding portfolio audit

Agencies managing large research portfolios use audit_grant_research_link to identify grants where the paper-to-grant ratio is anomalously high or low, where grants have been terminated, or where funding concentration risk is elevated. Batch screening surfaces the highest-risk items for prioritised review without manually checking each grant record.

Citation manipulation investigation

When a researcher is under investigation for suspected citation ring participation, detect_citation_anomalies returns the full Benford's law digit-by-digit comparison with observed vs. expected percentages and deviation scores for digits 1-9. This provides the statistical evidence base that integrity committees need before escalating to formal misconduct proceedings.

How to connect this research integrity screening MCP

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "research-integrity-screening": {
      "url": "https://research-integrity-screening-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Cursor, Windsurf, or Cline

Use the same URL and token in your MCP server settings panel. The server communicates via standard MCP protocol over HTTP POST to /mcp.

Python (via requests)

import requests

response = requests.post(
    "https://research-integrity-screening-mcp.apify.actor/mcp",
    headers={
        "Content-Type": "application/json",
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
    },
    json={
        "jsonrpc": "2.0",
        "method": "tools/call",
        "params": {
            "name": "generate_integrity_report",
            "arguments": {"entity": "Dr. Marcus Webb Global Health Institute"}
        },
        "id": 1
    }
)
result = response.json()
report = result["result"]["content"][0]["text"]
print(report)

JavaScript

const response = await fetch(
  "https://research-integrity-screening-mcp.apify.actor/mcp",
  {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer YOUR_APIFY_TOKEN"
    },
    body: JSON.stringify({
      jsonrpc: "2.0",
      method: "tools/call",
      params: {
        name: "screen_researcher_integrity",
        arguments: { researcher: "Dr. Elena Sokolova 0000-0002-7831-4412" }
      },
      id: 1
    })
  }
);
const data = await response.json();
const report = JSON.parse(data.result.content[0].text);
console.log(`Integrity level: ${report.researcherIntegrity.integrityLevel}`);
console.log(`Score: ${report.researcherIntegrity.score}/100`);

cURL

# Screen a researcher for integrity flags
curl -X POST "https://research-integrity-screening-mcp.apify.actor/mcp" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "generate_integrity_report",
      "arguments": { "entity": "Dr. James Chen Stanford University" }
    },
    "id": 1
  }'

MCP tools

ToolInputPriceWhat it returns
screen_researcher_integrityresearcher — name or ORCID ID$0.045Integrity Score 0-100, retraction flags, citation anomaly score, velocity red flags, ORCID verification status, CLEAN-to-CRITICAL level
check_publication_flagsquery — title, DOI, researcher, or topic$0.045Paper mill score 0-100, template flags, journal concentration, author diversity, UNLIKELY-to-CONFIRMED_MILL verdict
assess_journal_qualityquery — journal name, topic, or researcher$0.045Quality score 0-100, citation impact, open access ratio, source diversity, PREDATORY-to-ELITE verdict
detect_citation_anomaliesresearcher — name or institution$0.045Benford's law digit 1-9 analysis: observed %, expected %, deviation per digit; citation min/max/mean
audit_grant_research_linkresearcher — PI name; topic — optional filter$0.045Funding risk score, grant list, paper-to-grant ratio, HHI concentration, terminated grant flags, LOW-to-CRITICAL level
compare_institutional_integrityinstitution_a, institution_b$0.045Side-by-side journal quality and funding risk for two institutions, quality advantage indicator
generate_integrity_reportentity — researcher, institution, or topic$0.045Full composite report: all 4 model scores, weighted composite 0-100, CLEAR-to-HIGH_RISK verdict, all signals, required actions

Tool input reference

ToolParameterTypeRequiredDescription
screen_researcher_integrityresearcherstringYesResearcher name (e.g. "Dr. Wei Zhang Beijing University") or ORCID ID (e.g. "0000-0002-1234-5678")
check_publication_flagsquerystringYesPaper title, DOI, researcher name, or research topic
assess_journal_qualityquerystringYesJournal name, research topic, or researcher name
detect_citation_anomaliesresearcherstringYesResearcher name or institution name
audit_grant_research_linkresearcherstringYesPrincipal investigator name or institution
audit_grant_research_linktopicstringNoResearch topic to narrow the NIH grant search
compare_institutional_integrityinstitution_astringYesFirst institution name (e.g. "Stanford University")
compare_institutional_integrityinstitution_bstringYesSecond institution name (e.g. "Duke University")
generate_integrity_reportentitystringYesResearcher name, institution, or paper topic for full cross-source screening

Output example

{
  "entity": "Dr. Marcus Webb Global Health Institute",
  "compositeScore": 58,
  "verdict": "INVESTIGATION_NEEDED",
  "researcherIntegrity": {
    "score": 42,
    "publicationCount": 89,
    "retractionFlags": 6,
    "citationAnomalies": 5,
    "velocityRedFlags": 3,
    "integrityLevel": "SUSPICIOUS",
    "signals": [
      "Retraction/correction flags detected — 6 points",
      "Citation distribution suspiciously uniform — potential citation manipulation",
      "47 publications in 2021 — suspiciously high output",
      "Publication spike: 12 → 47 papers (2020 → 2021)"
    ]
  },
  "paperMill": {
    "score": 44,
    "suspiciousPatterns": 3,
    "templateFlags": 4,
    "millLevel": "PROBABLE",
    "signals": [
      "Repeated title pattern (4x): \"role of inflammation in...\" — possible template",
      "31 papers in single journal \"international journal of molecular biology\" — over-concentration",
      "Low author diversity — same author groups across many papers"
    ]
  },
  "journalQuality": {
    "score": 34,
    "totalPapers": 89,
    "highCitationPapers": 8,
    "openAccessRatio": 0.28,
    "qualityLevel": "LOW",
    "signals": [
      "Low citation impact: avg 1.7 — possible predatory venue"
    ]
  },
  "fundingRisk": {
    "score": 38,
    "grantCount": 3,
    "flaggedGrants": 1,
    "fundingConcentration": 0.78,
    "riskLevel": "ELEVATED",
    "signals": [
      "High funding concentration — single-source dependency risk",
      "1 terminated/withdrawn grants — compliance concern"
    ]
  },
  "allSignals": [
    "Retraction/correction flags detected — 6 points",
    "Citation distribution suspiciously uniform — potential citation manipulation",
    "47 publications in 2021 — suspiciously high output",
    "Publication spike: 12 → 47 papers (2020 → 2021)",
    "Repeated title pattern (4x): \"role of inflammation in...\" — possible template",
    "31 papers in single journal \"international journal of molecular biology\" — over-concentration",
    "Low author diversity — same author groups across many papers",
    "Low citation impact: avg 1.7 — possible predatory venue",
    "High funding concentration — single-source dependency risk",
    "1 terminated/withdrawn grants — compliance concern"
  ],
  "requiredActions": [
    "Review retracted publications — determine scope of affected research",
    "Paper mill indicators — investigate authorship and submission patterns",
    "Publications in suspected predatory venues — verify journal legitimacy",
    "Review terminated/withdrawn grants for compliance issues",
    "Citation pattern anomalies — check for citation rings or manipulation"
  ]
}

Output fields

FieldTypeDescription
entitystringThe researcher, institution, or topic queried
compositeScorenumberWeighted composite integrity risk score 0-100 (higher = more concern)
verdictstringCLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK
researcherIntegrity.scorenumberIntegrity sub-score 0-100
researcherIntegrity.publicationCountnumberTotal papers found across all sources
researcherIntegrity.retractionFlagsnumberRaw retraction/correction points accumulated
researcherIntegrity.citationAnomaliesnumberCitation anomaly raw score
researcherIntegrity.velocityRedFlagsnumberPublication velocity raw points
researcherIntegrity.integrityLevelstringCLEAN / MINOR_FLAGS / SUSPICIOUS / HIGH_RISK / CRITICAL
researcherIntegrity.signalsstring[]Human-readable descriptions of each flag triggered
paperMill.scorenumberPaper mill risk score 0-100
paperMill.suspiciousPatternsnumberCount of suspicious journal and author patterns
paperMill.templateFlagsnumberRaw template title repetition count
paperMill.millLevelstringUNLIKELY / POSSIBLE / PROBABLE / LIKELY_MILL / CONFIRMED_MILL
paperMill.signalsstring[]Specific patterns flagged with counts and journal names
journalQuality.scorenumberJournal quality score 0-100 (higher = better quality)
journalQuality.totalPapersnumberPapers analysed for journal quality
journalQuality.highCitationPapersnumberPapers with 10 or more citations
journalQuality.openAccessRationumberFraction of papers that are open access (0.0-1.0)
journalQuality.qualityLevelstringPREDATORY / LOW / MODERATE / HIGH / ELITE
journalQuality.signalsstring[]Citation impact and open access transparency signals
fundingRisk.scorenumberFunding risk score 0-100
fundingRisk.grantCountnumberNIH grants found
fundingRisk.flaggedGrantsnumberTerminated, withdrawn, or suspended grants
fundingRisk.fundingConcentrationnumberHerfindahl-Hirschman Index 0-1 for funding source concentration
fundingRisk.riskLevelstringLOW / MODERATE / ELEVATED / HIGH / CRITICAL
fundingRisk.signalsstring[]Specific funding anomalies identified
allSignalsstring[]Combined signals from all four scoring models
requiredActionsstring[]Recommended next steps derived from triggered threshold flags

How much does it cost to screen researchers?

Research Integrity Screening MCP uses pay-per-event pricing — you pay $0.045 per tool call. Platform compute costs are included.

ScenarioTool callsCost per callTotal cost
Quick test — single researcher screen1$0.045$0.045
Integrity check plus citation anomaly analysis2$0.045$0.09
Full integrity report (all 7 sources, 4 models)1$0.045$0.045
Screen 10 grant applicants10$0.045$0.45
Monthly journal submission workflow — 200 submissions200$0.045$9.00

You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached, returning a structured error your pipeline can handle gracefully.

Apify's free tier includes $5 of monthly platform credits — enough for over 100 tool calls before you need to add payment. Compare this to institutional research integrity software priced at $3,000-15,000 per year. Most teams using this MCP spend under $20/month with no subscription commitment.

How Research Integrity Screening MCP works

Data collection phase

Each tool call triggers parallel execution of between 2 and 7 downstream Apify actors via Promise.allSettled, ensuring that a slow or failing data source does not block the response. The seven data sources are hardcoded to specific trusted actor IDs: openalex-research-papers (AfAA3gEDtEiU9Zf5s), orcid-researcher-search (Nuq9OYuSRgU3DKFYz), pubmed-research-search (AwPvHhEjcgAd6hcvG), semantic-scholar-search (LgVeUXmTsWl9Gl2Tb), crossref-paper-search (b6ReNaLwZXInCFeMr), core-academic-search (Jh4Y6VfuSZkxkF8eq), and nih-research-grants (dGvWHX8Oa5vRK9pNb). Each child actor runs at 256 MB memory with a 120-second timeout. Failed sources return empty arrays rather than crashing the scoring phase.

Scoring model phase

Four independent scoring functions operate on the merged dataset. The Researcher Integrity scorer caps at 100 points: retraction and correction keyword scanning across all papers (max 35 pts), citation distribution Benford's law heuristic (max 25 pts), velocity spike detection (max 20 pts), and ORCID completeness (max 20 pts). The Paper Mill detector accumulates from: repeated 5-word title prefix frequency at 3+ occurrences (max 30 pts), single-journal concentration above 50% (max 25 pts), low author-set diversity below 30% with 10+ papers (max 25 pts), and raw volume anomaly above 50 papers (max 20 pts). The Journal Quality scorer is a positive score (higher = better): average citation impact per paper (max 35 pts), open access ratio (max 20 pts), source diversity across unique venues (max 25 pts), and volume health (max 20 pts). The Funding Risk scorer uses: paper-to-grant ratio outlier detection (max 30 pts), HHI funding concentration index (max 25 pts), terminated and withdrawn grant flags at 8 points each (max 25 pts), and a no-grant penalty of 10 points.

Benford's law implementation

For citation anomaly detection, the server builds a Map<number, number> of leading-digit counts across all citation values above zero. The detect_citation_anomalies tool returns the full per-digit breakdown: observed percentage, expected percentage (digit 1 at 30.1%, digit 2 at 17.6%, down to digit 9 at 4.6%), and absolute deviation for each digit 1 through 9. The screen_researcher_integrity tool uses a faster heuristic: if digit 1 falls below 15% or above 50%, it adds 5 points to the citation anomaly sub-score. It also applies a coefficient of variation check — if CV < 0.3 across 10+ papers, the citation distribution is flagged as suspiciously uniform.

Composite scoring and verdict assignment

The final composite score weights the four models: Researcher Integrity × 0.30 + Paper Mill × 0.25 + (100 minus Journal Quality) × 0.25 + Funding Risk × 0.20. Journal Quality is inverted so poor journal quality contributes positively to overall risk. Verdict thresholds: CLEAR (composite below 20), MINOR_CONCERNS (20-39), INVESTIGATION_NEEDED (40-64), HIGH_RISK (65 and above). Two hard overrides apply: CRITICAL integrity level from the researcher model and CONFIRMED_MILL from the paper mill model both force HIGH_RISK regardless of composite score. The requiredActions list is generated deterministically from specific threshold triggers, ensuring concrete next steps even when the overall score is borderline.

Tips for best results

  1. Include institution in the researcher query. "Dr. Sarah Kim" returns less precise results than "Dr. Sarah Kim Yale School of Medicine". Disambiguation improves all seven data sources simultaneously.

  2. Use ORCID IDs when available. If you have a researcher's ORCID ID (format: 0000-0002-XXXX-XXXX), pass it as the researcher parameter. The ORCID actor returns the exact profile rather than a name search, eliminating false positives from common names.

  3. Run detect_citation_anomalies before generate_integrity_report for formal investigations. The citation anomaly tool returns the full Benford's law table — useful for building an evidence dossier. The full integrity report returns only the summary flag.

  4. Set spending limits for batch workflows. When screening 50 or more grant applicants, set a maxTotalChargeUsd on the Apify run to cap exposure. The server returns a structured error when the limit is reached so your pipeline can handle it gracefully.

  5. Use compare_institutional_integrity for partnership due diligence. Rather than running two separate full reports, this tool queries both institutions in parallel and returns a structured side-by-side comparison in a single call.

  6. Add a topic filter to grant audits. Pair audit_grant_research_link with a specific topic parameter to narrow NIH grant results. "Elizabeth Torres CRISPR" is more targeted than "Elizabeth Torres" alone when a researcher has a large grant portfolio.

  7. Treat PROBABLE mill level as an escalation trigger, not a verdict. Title template patterns can arise from legitimate research programmes — clinical trial series, systematic reviews, and multi-part studies all use consistent naming conventions. Use the tool output to route submissions to a specialist reviewer.

Combine with other Apify actors

ActorHow to combine
Researcher Integrity CheckRun this actor for deep single-researcher profiling; use the MCP for conversational AI workflows and batch screening
Company Deep ResearchCombine when screening biotech or pharma companies whose leadership has academic publishing histories
ORCID Researcher SearchPull raw ORCID profile data directly before feeding into integrity analysis
PubMed Research SearchPull full biomedical literature sets for manual review of flagged publications
NIH Research GrantsQuery NIH grants independently for portfolio-level funding analysis outside the MCP
SEC EDGAR Filing AnalyzerCross-reference publicly traded biotech and pharma researchers against disclosed financial conflicts
B2B Lead QualifierScore academic-to-industry transition candidates using both research integrity and commercial signals

Limitations

  • Citation data completeness depends on source coverage. OpenAlex and Semantic Scholar have strong coverage for STEM fields; humanities and social sciences have lower citation indexing rates. Scores for non-STEM researchers may understate citation activity.
  • Benford's law requires at least 10 cited papers to be statistically meaningful. Early-career researchers with fewer than 10 cited papers will not trigger the citation distribution check. Scores for these researchers are computed from the other three models only.
  • Paper mill template detection uses 5-word title prefix matching. Legitimate research programmes such as multi-paper clinical trial series with consistent naming conventions can produce false positives. Always read the signals array to assess whether the flagged pattern reflects the researcher's actual methodology.
  • NIH grant data covers US federal funding only. Researchers primarily funded by European, Asian, or private foundation grants will receive a "no grants found" penalty. Use the topic parameter in audit_grant_research_link to reduce false positives for international researchers.
  • ORCID completeness scoring penalises sparse profiles. Some senior researchers maintain sparse ORCID profiles by choice. A low ORCID sub-score alone should not drive a negative verdict without corroborating signals.
  • This tool detects statistical anomalies, not fraud. Anomalous patterns have legitimate explanations. The output is a screening signal for human review, not a forensic determination.
  • Query resolution depends on name disambiguation. Common researcher names without institutional context return mixed results from multiple individuals. Always include institution name or ORCID ID for definitive identification.
  • Child actor timeout is 120 seconds. If a data source is slow, it times out and returns an empty array for that source. The composite score is computed on available data; missing sources reduce confidence in low-risk verdicts.

Integrations

  • Apify API — trigger integrity screenings programmatically from grant management systems or editorial workflow software
  • Webhooks — push HIGH_RISK verdicts to Slack, email, or case management tools the moment a screening completes
  • Zapier — connect to Airtable or Google Sheets grant trackers; auto-log integrity scores when new applicants are added
  • Make — build editorial submission workflows that auto-route papers to integrity review queues based on the mill level verdict
  • Google Sheets — export batch researcher screening results to shared spreadsheets for committee review
  • LangChain / LlamaIndex — embed research integrity screening as a tool in LLM agent pipelines for automated due diligence workflows

Troubleshooting

Composite score is 0 despite querying a well-known researcher. This usually means the researcher's name did not resolve correctly across the data sources and all actors returned empty datasets. Try adding the institution name to the query. For common names (e.g. "Wei Zhang"), add the specific field ("Wei Zhang computational biology MIT") or use the ORCID ID directly.

Tool returns "error": true, "message": "Spending limit reached". Your Apify run has hit the maximum charge limit configured for the run. Increase maxTotalChargeUsd in your run configuration, or purchase additional platform credits in the Apify console.

Paper mill score is high for a legitimate systematic review programme. The template detection model flags repeated title prefixes. Systematic reviews and meta-analyses legitimately use series naming conventions ("Systematic review of X in Y: Part 1, 2, 3"). Check the specific pattern in signals — if it matches the researcher's known methodology, contextualise the flag as a false positive in your assessment.

audit_grant_research_link shows no grants for a funded researcher. NIH data covers only US federal grants. Non-NIH funding (NSF, DOD, DARPA, private foundations, international funders) will not appear. Add a topic parameter to improve NIH search relevance and note the limitation in your assessment.

Child actor timeouts causing incomplete results. Empty data arrays in the detailed output indicate that one or more source actors timed out. Re-run the same query — subsequent runs typically succeed. If a specific source consistently times out, treat the generate_integrity_report result as partial and note which sources were unavailable.

Responsible use

  • All data accessed by this server comes from publicly available academic databases and federal grant records.
  • Research integrity screening results are statistical indicators, not forensic findings. Do not use scores as the sole basis for adverse employment, funding, or publication decisions without independent expert review.
  • Comply with applicable data protection regulations when storing or sharing screening outputs that include personal information about researchers.
  • Do not use this tool to harass, defame, or discriminate against researchers based on screening scores alone.
  • For guidance on web scraping and data use legality, see Apify's guide.

FAQ

How does research integrity screening with Benford's law work? Benford's law predicts that in naturally occurring numerical datasets, the digit 1 appears as the leading digit about 30.1% of the time, digit 2 about 17.6%, down to digit 9 at 4.6%. Citation counts across a large publication set follow this distribution naturally. When a researcher's citation counts deviate significantly — particularly when digit 1 appears far less than expected — it can indicate artificial inflation of specific citation counts. The detect_citation_anomalies tool returns the per-digit observed vs. expected comparison so your team can assess the magnitude of deviation for each digit independently.

What is a paper mill and how does research integrity screening detect it? Paper mills are commercial services that produce fake or plagiarised academic manuscripts for sale to researchers who need publication credits. Their output tends to share structural fingerprints: repeated title templates across papers, excessive concentration in a small number of accepting journals, and the same tight author groups appearing across many unrelated papers. The check_publication_flags tool detects all three patterns using title prefix frequency analysis (5-word prefixes appearing 3+ times), journal concentration scoring (above 50% in one journal), and author-set uniqueness ratios (below 30% unique across 10+ papers).

Can research integrity screening definitively identify academic fraud? No. The MCP identifies statistical anomalies and red flags associated with integrity concerns. Abnormal patterns have legitimate explanations — a researcher who publishes exclusively in one journal may be the editor of that journal, or may work in a narrow field with few suitable venues. Use the output to prioritise human review, not to substitute for it. The requiredActions field identifies the specific checks a human reviewer should perform next.

How accurate is the publication velocity check? The velocity model flags any year where a researcher's publication count exceeds 30 papers, and flags year-over-year increases of 3x or more with at least 10 papers in the latter year. These thresholds are calibrated against typical academic output (most researchers publish 3-8 papers per year). A single high-output year is possible for researchers leading large collaborative projects or clinical trial consortia. Consistent multi-year high velocity is a stronger signal.

How long does a typical research integrity screening take? The screen_researcher_integrity and check_publication_flags tools query 3-4 data sources in parallel and typically complete in 60-90 seconds. The generate_integrity_report tool queries all 7 sources in parallel and typically completes in 90-150 seconds, depending on API response times from the underlying academic databases.

How many researchers can I screen in one session? There is no hard per-session limit. For batch screening, run tool calls sequentially or in parallel depending on your client's concurrency. Each call is independently priced at $0.045. For screening 200 or more researchers, consider scheduling Apify runs via the API and processing results asynchronously.

How is research integrity screening different from iThenticate or Turnitin? iThenticate and Turnitin detect textual plagiarism by comparing manuscript text against known sources. This MCP does not analyse manuscript text — it analyses publication metadata, citation patterns, grant records, and researcher profiles. The two approaches are complementary: use plagiarism detection for submitted manuscripts and this MCP for researcher-level pattern analysis and pre-award screening.

Does research integrity screening detect conflicts of interest? Not directly. The funding risk model identifies grant-publication linkages and funding concentration, which can reveal financial relationships. For explicit conflict of interest screening — including industry payments, equity holdings, or consulting relationships — combine with financial disclosure databases. The SEC EDGAR Filing Analyzer can identify researchers with disclosed financial interests in publicly traded companies.

Is it legal to use this tool for researcher screening? All underlying data sources are publicly available: OpenAlex, ORCID, PubMed, Crossref, CORE, and NIH Grants are open academic databases. Accessing and analysing public records for research integrity purposes is a standard practice in academic administration and grant management. See Apify's guide on web scraping legality for broader context.

Can I schedule research integrity screening to run periodic sweeps? Yes. Use the Apify Scheduler to trigger the actor on a daily, weekly, or monthly cadence. Configure a webhook to push HIGH_RISK verdicts to your notification system. This is useful for ongoing monitoring of a research portfolio — alerting when a previously CLEAR researcher's score moves to INVESTIGATION_NEEDED after new publications appear.

What happens when a data source returns no results? The scoring functions handle empty arrays gracefully — each source defaults to an empty array when its actor times out or returns no matches. Scores are computed on available data. When ORCID returns nothing, the completeness model adds a 10-point penalty to the integrity sub-score and emits a "No ORCID profile found" signal. Treat results where multiple sources returned empty with more caution than results backed by all seven sources.

How is research integrity screening different from running each academic database search manually? Manual cross-database research takes 2-3 hours per researcher. This MCP queries all seven sources in parallel in under 2 minutes, applies four scoring algorithms automatically, and returns machine-readable structured output that your AI agent can reason over. It eliminates transcription errors between databases and produces consistent, comparable scores across every researcher you screen.

Help us improve

If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:

  1. Go to Account Settings > Privacy
  2. Enable Share runs with public Actor creators

This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.

Support

Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.

How it works

01

Configure

Set your parameters in the Apify Console or pass them via API.

02

Run

Click Start, trigger via API, webhook, or set up a schedule.

03

Get results

Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.

Use cases

Sales Teams

Build targeted lead lists with verified contact data.

Marketing

Research competitors and identify outreach opportunities.

Data Teams

Automate data collection pipelines with scheduled runs.

Developers

Integrate via REST API or use as an MCP tool in AI workflows.

Ready to try Research Integrity Screening MCP Server?

Start for free on Apify. No credit card required.

Open on Apify Store