Research Integrity Screening MCP Server
Research integrity screening MCP that connects Claude, Cursor, and other AI agents to academic fraud detection across 7 live data sources. Screen researchers, detect paper mill output, flag citation manipulation using Benford's law analysis, assess journal quality, and audit NIH grant-publication linkages — all in a single tool call from your AI agent. Returns a composite **Integrity Score (0-100)** with a CLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK verdict.
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| screen_researcher_integrity | OpenAlex + ORCID + PubMed + Semantic Scholar integrity check. | $0.15 |
| check_publication_flags | Paper mill detection + template patterns. | $0.10 |
| assess_journal_quality | Citation impact + open access + source diversity. | $0.10 |
| detect_citation_anomalies | Benford's law + citation distribution analysis. | $0.08 |
| audit_grant_research_link | NIH grants + publication linkage + funding risk. | $0.12 |
| compare_institutional_integrity | Side-by-side quality + funding comparison. | $0.20 |
| generate_integrity_report | All 7 sources, 4 scoring models, CLEAR/HIGH_RISK verdict. | $0.35 |
Example: 100 events = $15.00 · 1,000 events = $150.00
Connect to your AI agent
Add this MCP server to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.
https://ryanclinton--research-integrity-screening-mcp.apify.actor/mcp{
"mcpServers": {
"research-integrity-screening-mcp": {
"url": "https://ryanclinton--research-integrity-screening-mcp.apify.actor/mcp"
}
}
}Documentation
Research integrity screening MCP that connects Claude, Cursor, and other AI agents to academic fraud detection across 7 live data sources. Screen researchers, detect paper mill output, flag citation manipulation using Benford's law analysis, assess journal quality, and audit NIH grant-publication linkages — all in a single tool call from your AI agent. Returns a composite Integrity Score (0-100) with a CLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK verdict.
This server runs in Standby mode on the Apify platform, responding to MCP requests without cold-start delays. It orchestrates OpenAlex, ORCID, PubMed, Semantic Scholar, Crossref, CORE, and NIH Research Grants in parallel, applies four independent scoring models, and returns structured JSON that your AI agent can reason over directly. Grant reviewers, journal editors, and research integrity officers get consistent, reproducible scores rather than ad hoc judgement calls.
What data can you access?
| Data Point | Source | Example |
|---|---|---|
| 📄 Publication metadata, citation counts, DOIs | OpenAlex | 247 papers, avg 18.3 citations |
| 👤 Researcher profiles, affiliations, employment history | ORCID | Dr. M. Petrov, MIT 2018-present |
| 🔬 Biomedical literature, MeSH terms, abstracts | PubMed | "Expression of Concern: oncology study" |
| 📊 AI citation analysis, influence scores, paper embeddings | Semantic Scholar | Influence score 94, 12 highly-cited papers |
| 🔗 DOI metadata, reference lists, journal metadata | Crossref | 10.1016/j.cell.2023.04.021 |
| 📂 Open access full-text repository coverage | CORE | 61% OA ratio across publication set |
| 💰 Federal grant awards, PI names, funding amounts | NIH Grants | R01CA123456, $1.2M, University of Chicago |
| 🚩 Retraction / correction / expression of concern flags | OpenAlex + PubMed | 3 retraction flags, 2 corrections detected |
| 📈 Publication velocity by year, year-over-year spike detection | OpenAlex + PubMed | 47 papers in 2022 — velocity spike flagged |
| 🏦 Funding concentration index (HHI), terminated grant flags | NIH Grants | HHI 0.82 — single-source dependency risk |
Why use Research Integrity Screening MCP?
Manual research integrity review is slow and inconsistent. Checking a single researcher across OpenAlex, ORCID, PubMed, Semantic Scholar, Crossref, and NIH Grants separately takes 2-3 hours per subject. Applying Benford's law to citation distributions requires spreadsheet work most reviewers skip entirely. Paper mill template detection across dozens of papers is impractical without automation. And the results of manual review are rarely comparable across screeners or repeatable over time.
This MCP automates the entire workflow. A single tool call queries all seven sources in parallel, applies four scoring algorithms, and returns a structured verdict in under 2 minutes. The MCP format means your AI agent calls these tools mid-conversation — ask Claude to screen a grant applicant and it invokes the tool, interprets the score, and explains the findings without you opening a separate application.
- Scheduling — run periodic integrity sweeps on Apify Scheduler; flag new anomalies automatically
- API access — trigger screenings from Python, JavaScript, or any HTTP client using standard MCP protocol
- Parallel data fetching — all seven data sources queried simultaneously, not sequentially
- Monitoring — receive Slack or email alerts when HIGH_RISK verdicts are returned via Apify webhooks
- Integrations — pipe results into Notion, Airtable, or any webhook-compatible grant management system
Features
- Benford's law citation analysis — computes leading-digit frequency distribution across a researcher's full citation set and flags deviation from the expected logarithmic distribution (digit 1 expected at 30.1%)
- Coefficient of variation check — detects suspiciously uniform citation distributions where CV < 0.3 across 10+ papers, a statistical proxy for citation ring or self-citation manipulation
- Paper mill template detection — extracts the first 5-word prefix of each paper title and flags patterns that repeat 3 or more times across the publication set
- Journal concentration scoring — identifies when more than 50% of a researcher's papers appear in a single journal, a known paper mill indicator
- Author diversity analysis — computes the unique author-set ratio across all papers; low diversity below 30% with 10+ papers triggers a flag
- Publication velocity monitoring — flags any calendar year with more than 30 publications, and detects year-over-year spikes of 3x or greater with at least 10 papers
- ORCID verification scoring — penalises missing profiles, empty works lists, and absent affiliation records as identity-unverified risk signals
- Retraction and correction detection — scans publication titles and document types for "retract", "correction", "erratum", and "expression of concern" keywords across OpenAlex, PubMed, and Semantic Scholar
- NIH grant-to-paper ratio — computes publications-per-grant ratio; ratios above 20:1 flag potential output padding; ratios below 1:1 flag low productivity
- Funding concentration HHI — applies the Herfindahl-Hirschman Index to funding sources; concentration above 0.7 with 3+ grants signals single-source dependency
- Terminated grant detection — scans NIH grant records for "terminated", "withdrawn", and "suspended" status text
- Four independent scoring models — Researcher Integrity (max 100), Paper Mill (max 100), Journal Quality (max 100, positive scale), Funding Risk (max 100) each produce standalone scores
- Weighted composite score — combines all four models: Integrity 30% + Paper Mill 25% + (100 minus Journal Quality) 25% + Funding Risk 20%
- Five-tier verdicts per model — each sub-model uses domain-appropriate labels (CLEAN through CRITICAL for integrity; UNLIKELY through CONFIRMED_MILL for paper mills; PREDATORY through ELITE for journals)
- Hard override logic — CRITICAL integrity level or CONFIRMED_MILL verdict forces HIGH_RISK regardless of composite score
- Deterministic required actions — the
requiredActionslist is generated from specific threshold triggers, not the composite score, ensuring concrete next steps even when the overall score is borderline
Use cases for research integrity screening
Pre-award grant screening
Grant programme officers at federal agencies and private foundations need to vet principal investigators before committing funds. A single screen_researcher_integrity call cross-references the applicant's publication record across four databases, applies Benford's law to their citation history, checks for retraction history, and verifies their ORCID profile. Officers get a scored, reproducible result they can attach to the application file, replacing hours of manual database lookups with a 90-second workflow.
Journal submission integrity review
Peer review coordinators can run check_publication_flags against a submitted manuscript's author list or topic before assigning reviewers. The paper mill detection model checks for repeated title templates, journal over-concentration in the author's history, and author-group uniformity — the three most reliable early indicators of paper mill output. A PROBABLE or higher mill score routes the submission to an integrity editor rather than standard peer review.
Faculty hiring due diligence
Provosts and department chairs screening candidates can run generate_integrity_report to receive a full composite view before making offers. The tool verifies ORCID identity, assesses publication velocity for implausible output rates, checks for retraction history, and evaluates whether the candidate's journal choices reflect credible venues. This takes 90 seconds rather than three days of reference checking.
Research institution partnership assessment
Before formalising a collaboration, compliance teams can run compare_institutional_integrity to benchmark two institutions side-by-side on journal quality and funding risk. The tool queries OpenAlex and NIH Grants for both entities simultaneously and returns a structured comparison with a quality advantage indicator — useful for partnership decision memos.
Funding portfolio audit
Agencies managing large research portfolios use audit_grant_research_link to identify grants where the paper-to-grant ratio is anomalously high or low, where grants have been terminated, or where funding concentration risk is elevated. Batch screening surfaces the highest-risk items for prioritised review without manually checking each grant record.
Citation manipulation investigation
When a researcher is under investigation for suspected citation ring participation, detect_citation_anomalies returns the full Benford's law digit-by-digit comparison with observed vs. expected percentages and deviation scores for digits 1-9. This provides the statistical evidence base that integrity committees need before escalating to formal misconduct proceedings.
How to connect this research integrity screening MCP
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"research-integrity-screening": {
"url": "https://research-integrity-screening-mcp.apify.actor/mcp",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}
Cursor, Windsurf, or Cline
Use the same URL and token in your MCP server settings panel. The server communicates via standard MCP protocol over HTTP POST to /mcp.
Python (via requests)
import requests
response = requests.post(
"https://research-integrity-screening-mcp.apify.actor/mcp",
headers={
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_APIFY_TOKEN"
},
json={
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "generate_integrity_report",
"arguments": {"entity": "Dr. Marcus Webb Global Health Institute"}
},
"id": 1
}
)
result = response.json()
report = result["result"]["content"][0]["text"]
print(report)
JavaScript
const response = await fetch(
"https://research-integrity-screening-mcp.apify.actor/mcp",
{
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer YOUR_APIFY_TOKEN"
},
body: JSON.stringify({
jsonrpc: "2.0",
method: "tools/call",
params: {
name: "screen_researcher_integrity",
arguments: { researcher: "Dr. Elena Sokolova 0000-0002-7831-4412" }
},
id: 1
})
}
);
const data = await response.json();
const report = JSON.parse(data.result.content[0].text);
console.log(`Integrity level: ${report.researcherIntegrity.integrityLevel}`);
console.log(`Score: ${report.researcherIntegrity.score}/100`);
cURL
# Screen a researcher for integrity flags
curl -X POST "https://research-integrity-screening-mcp.apify.actor/mcp" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_APIFY_TOKEN" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "generate_integrity_report",
"arguments": { "entity": "Dr. James Chen Stanford University" }
},
"id": 1
}'
MCP tools
| Tool | Input | Price | What it returns |
|---|---|---|---|
screen_researcher_integrity | researcher — name or ORCID ID | $0.045 | Integrity Score 0-100, retraction flags, citation anomaly score, velocity red flags, ORCID verification status, CLEAN-to-CRITICAL level |
check_publication_flags | query — title, DOI, researcher, or topic | $0.045 | Paper mill score 0-100, template flags, journal concentration, author diversity, UNLIKELY-to-CONFIRMED_MILL verdict |
assess_journal_quality | query — journal name, topic, or researcher | $0.045 | Quality score 0-100, citation impact, open access ratio, source diversity, PREDATORY-to-ELITE verdict |
detect_citation_anomalies | researcher — name or institution | $0.045 | Benford's law digit 1-9 analysis: observed %, expected %, deviation per digit; citation min/max/mean |
audit_grant_research_link | researcher — PI name; topic — optional filter | $0.045 | Funding risk score, grant list, paper-to-grant ratio, HHI concentration, terminated grant flags, LOW-to-CRITICAL level |
compare_institutional_integrity | institution_a, institution_b | $0.045 | Side-by-side journal quality and funding risk for two institutions, quality advantage indicator |
generate_integrity_report | entity — researcher, institution, or topic | $0.045 | Full composite report: all 4 model scores, weighted composite 0-100, CLEAR-to-HIGH_RISK verdict, all signals, required actions |
Tool input reference
| Tool | Parameter | Type | Required | Description |
|---|---|---|---|---|
screen_researcher_integrity | researcher | string | Yes | Researcher name (e.g. "Dr. Wei Zhang Beijing University") or ORCID ID (e.g. "0000-0002-1234-5678") |
check_publication_flags | query | string | Yes | Paper title, DOI, researcher name, or research topic |
assess_journal_quality | query | string | Yes | Journal name, research topic, or researcher name |
detect_citation_anomalies | researcher | string | Yes | Researcher name or institution name |
audit_grant_research_link | researcher | string | Yes | Principal investigator name or institution |
audit_grant_research_link | topic | string | No | Research topic to narrow the NIH grant search |
compare_institutional_integrity | institution_a | string | Yes | First institution name (e.g. "Stanford University") |
compare_institutional_integrity | institution_b | string | Yes | Second institution name (e.g. "Duke University") |
generate_integrity_report | entity | string | Yes | Researcher name, institution, or paper topic for full cross-source screening |
Output example
{
"entity": "Dr. Marcus Webb Global Health Institute",
"compositeScore": 58,
"verdict": "INVESTIGATION_NEEDED",
"researcherIntegrity": {
"score": 42,
"publicationCount": 89,
"retractionFlags": 6,
"citationAnomalies": 5,
"velocityRedFlags": 3,
"integrityLevel": "SUSPICIOUS",
"signals": [
"Retraction/correction flags detected — 6 points",
"Citation distribution suspiciously uniform — potential citation manipulation",
"47 publications in 2021 — suspiciously high output",
"Publication spike: 12 → 47 papers (2020 → 2021)"
]
},
"paperMill": {
"score": 44,
"suspiciousPatterns": 3,
"templateFlags": 4,
"millLevel": "PROBABLE",
"signals": [
"Repeated title pattern (4x): \"role of inflammation in...\" — possible template",
"31 papers in single journal \"international journal of molecular biology\" — over-concentration",
"Low author diversity — same author groups across many papers"
]
},
"journalQuality": {
"score": 34,
"totalPapers": 89,
"highCitationPapers": 8,
"openAccessRatio": 0.28,
"qualityLevel": "LOW",
"signals": [
"Low citation impact: avg 1.7 — possible predatory venue"
]
},
"fundingRisk": {
"score": 38,
"grantCount": 3,
"flaggedGrants": 1,
"fundingConcentration": 0.78,
"riskLevel": "ELEVATED",
"signals": [
"High funding concentration — single-source dependency risk",
"1 terminated/withdrawn grants — compliance concern"
]
},
"allSignals": [
"Retraction/correction flags detected — 6 points",
"Citation distribution suspiciously uniform — potential citation manipulation",
"47 publications in 2021 — suspiciously high output",
"Publication spike: 12 → 47 papers (2020 → 2021)",
"Repeated title pattern (4x): \"role of inflammation in...\" — possible template",
"31 papers in single journal \"international journal of molecular biology\" — over-concentration",
"Low author diversity — same author groups across many papers",
"Low citation impact: avg 1.7 — possible predatory venue",
"High funding concentration — single-source dependency risk",
"1 terminated/withdrawn grants — compliance concern"
],
"requiredActions": [
"Review retracted publications — determine scope of affected research",
"Paper mill indicators — investigate authorship and submission patterns",
"Publications in suspected predatory venues — verify journal legitimacy",
"Review terminated/withdrawn grants for compliance issues",
"Citation pattern anomalies — check for citation rings or manipulation"
]
}
Output fields
| Field | Type | Description |
|---|---|---|
entity | string | The researcher, institution, or topic queried |
compositeScore | number | Weighted composite integrity risk score 0-100 (higher = more concern) |
verdict | string | CLEAR / MINOR_CONCERNS / INVESTIGATION_NEEDED / HIGH_RISK |
researcherIntegrity.score | number | Integrity sub-score 0-100 |
researcherIntegrity.publicationCount | number | Total papers found across all sources |
researcherIntegrity.retractionFlags | number | Raw retraction/correction points accumulated |
researcherIntegrity.citationAnomalies | number | Citation anomaly raw score |
researcherIntegrity.velocityRedFlags | number | Publication velocity raw points |
researcherIntegrity.integrityLevel | string | CLEAN / MINOR_FLAGS / SUSPICIOUS / HIGH_RISK / CRITICAL |
researcherIntegrity.signals | string[] | Human-readable descriptions of each flag triggered |
paperMill.score | number | Paper mill risk score 0-100 |
paperMill.suspiciousPatterns | number | Count of suspicious journal and author patterns |
paperMill.templateFlags | number | Raw template title repetition count |
paperMill.millLevel | string | UNLIKELY / POSSIBLE / PROBABLE / LIKELY_MILL / CONFIRMED_MILL |
paperMill.signals | string[] | Specific patterns flagged with counts and journal names |
journalQuality.score | number | Journal quality score 0-100 (higher = better quality) |
journalQuality.totalPapers | number | Papers analysed for journal quality |
journalQuality.highCitationPapers | number | Papers with 10 or more citations |
journalQuality.openAccessRatio | number | Fraction of papers that are open access (0.0-1.0) |
journalQuality.qualityLevel | string | PREDATORY / LOW / MODERATE / HIGH / ELITE |
journalQuality.signals | string[] | Citation impact and open access transparency signals |
fundingRisk.score | number | Funding risk score 0-100 |
fundingRisk.grantCount | number | NIH grants found |
fundingRisk.flaggedGrants | number | Terminated, withdrawn, or suspended grants |
fundingRisk.fundingConcentration | number | Herfindahl-Hirschman Index 0-1 for funding source concentration |
fundingRisk.riskLevel | string | LOW / MODERATE / ELEVATED / HIGH / CRITICAL |
fundingRisk.signals | string[] | Specific funding anomalies identified |
allSignals | string[] | Combined signals from all four scoring models |
requiredActions | string[] | Recommended next steps derived from triggered threshold flags |
How much does it cost to screen researchers?
Research Integrity Screening MCP uses pay-per-event pricing — you pay $0.045 per tool call. Platform compute costs are included.
| Scenario | Tool calls | Cost per call | Total cost |
|---|---|---|---|
| Quick test — single researcher screen | 1 | $0.045 | $0.045 |
| Integrity check plus citation anomaly analysis | 2 | $0.045 | $0.09 |
| Full integrity report (all 7 sources, 4 models) | 1 | $0.045 | $0.045 |
| Screen 10 grant applicants | 10 | $0.045 | $0.45 |
| Monthly journal submission workflow — 200 submissions | 200 | $0.045 | $9.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached, returning a structured error your pipeline can handle gracefully.
Apify's free tier includes $5 of monthly platform credits — enough for over 100 tool calls before you need to add payment. Compare this to institutional research integrity software priced at $3,000-15,000 per year. Most teams using this MCP spend under $20/month with no subscription commitment.
How Research Integrity Screening MCP works
Data collection phase
Each tool call triggers parallel execution of between 2 and 7 downstream Apify actors via Promise.allSettled, ensuring that a slow or failing data source does not block the response. The seven data sources are hardcoded to specific trusted actor IDs: openalex-research-papers (AfAA3gEDtEiU9Zf5s), orcid-researcher-search (Nuq9OYuSRgU3DKFYz), pubmed-research-search (AwPvHhEjcgAd6hcvG), semantic-scholar-search (LgVeUXmTsWl9Gl2Tb), crossref-paper-search (b6ReNaLwZXInCFeMr), core-academic-search (Jh4Y6VfuSZkxkF8eq), and nih-research-grants (dGvWHX8Oa5vRK9pNb). Each child actor runs at 256 MB memory with a 120-second timeout. Failed sources return empty arrays rather than crashing the scoring phase.
Scoring model phase
Four independent scoring functions operate on the merged dataset. The Researcher Integrity scorer caps at 100 points: retraction and correction keyword scanning across all papers (max 35 pts), citation distribution Benford's law heuristic (max 25 pts), velocity spike detection (max 20 pts), and ORCID completeness (max 20 pts). The Paper Mill detector accumulates from: repeated 5-word title prefix frequency at 3+ occurrences (max 30 pts), single-journal concentration above 50% (max 25 pts), low author-set diversity below 30% with 10+ papers (max 25 pts), and raw volume anomaly above 50 papers (max 20 pts). The Journal Quality scorer is a positive score (higher = better): average citation impact per paper (max 35 pts), open access ratio (max 20 pts), source diversity across unique venues (max 25 pts), and volume health (max 20 pts). The Funding Risk scorer uses: paper-to-grant ratio outlier detection (max 30 pts), HHI funding concentration index (max 25 pts), terminated and withdrawn grant flags at 8 points each (max 25 pts), and a no-grant penalty of 10 points.
Benford's law implementation
For citation anomaly detection, the server builds a Map<number, number> of leading-digit counts across all citation values above zero. The detect_citation_anomalies tool returns the full per-digit breakdown: observed percentage, expected percentage (digit 1 at 30.1%, digit 2 at 17.6%, down to digit 9 at 4.6%), and absolute deviation for each digit 1 through 9. The screen_researcher_integrity tool uses a faster heuristic: if digit 1 falls below 15% or above 50%, it adds 5 points to the citation anomaly sub-score. It also applies a coefficient of variation check — if CV < 0.3 across 10+ papers, the citation distribution is flagged as suspiciously uniform.
Composite scoring and verdict assignment
The final composite score weights the four models: Researcher Integrity × 0.30 + Paper Mill × 0.25 + (100 minus Journal Quality) × 0.25 + Funding Risk × 0.20. Journal Quality is inverted so poor journal quality contributes positively to overall risk. Verdict thresholds: CLEAR (composite below 20), MINOR_CONCERNS (20-39), INVESTIGATION_NEEDED (40-64), HIGH_RISK (65 and above). Two hard overrides apply: CRITICAL integrity level from the researcher model and CONFIRMED_MILL from the paper mill model both force HIGH_RISK regardless of composite score. The requiredActions list is generated deterministically from specific threshold triggers, ensuring concrete next steps even when the overall score is borderline.
Tips for best results
-
Include institution in the researcher query. "Dr. Sarah Kim" returns less precise results than "Dr. Sarah Kim Yale School of Medicine". Disambiguation improves all seven data sources simultaneously.
-
Use ORCID IDs when available. If you have a researcher's ORCID ID (format: 0000-0002-XXXX-XXXX), pass it as the
researcherparameter. The ORCID actor returns the exact profile rather than a name search, eliminating false positives from common names. -
Run
detect_citation_anomaliesbeforegenerate_integrity_reportfor formal investigations. The citation anomaly tool returns the full Benford's law table — useful for building an evidence dossier. The full integrity report returns only the summary flag. -
Set spending limits for batch workflows. When screening 50 or more grant applicants, set a
maxTotalChargeUsdon the Apify run to cap exposure. The server returns a structured error when the limit is reached so your pipeline can handle it gracefully. -
Use
compare_institutional_integrityfor partnership due diligence. Rather than running two separate full reports, this tool queries both institutions in parallel and returns a structured side-by-side comparison in a single call. -
Add a topic filter to grant audits. Pair
audit_grant_research_linkwith a specifictopicparameter to narrow NIH grant results. "Elizabeth Torres CRISPR" is more targeted than "Elizabeth Torres" alone when a researcher has a large grant portfolio. -
Treat PROBABLE mill level as an escalation trigger, not a verdict. Title template patterns can arise from legitimate research programmes — clinical trial series, systematic reviews, and multi-part studies all use consistent naming conventions. Use the tool output to route submissions to a specialist reviewer.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Researcher Integrity Check | Run this actor for deep single-researcher profiling; use the MCP for conversational AI workflows and batch screening |
| Company Deep Research | Combine when screening biotech or pharma companies whose leadership has academic publishing histories |
| ORCID Researcher Search | Pull raw ORCID profile data directly before feeding into integrity analysis |
| PubMed Research Search | Pull full biomedical literature sets for manual review of flagged publications |
| NIH Research Grants | Query NIH grants independently for portfolio-level funding analysis outside the MCP |
| SEC EDGAR Filing Analyzer | Cross-reference publicly traded biotech and pharma researchers against disclosed financial conflicts |
| B2B Lead Qualifier | Score academic-to-industry transition candidates using both research integrity and commercial signals |
Limitations
- Citation data completeness depends on source coverage. OpenAlex and Semantic Scholar have strong coverage for STEM fields; humanities and social sciences have lower citation indexing rates. Scores for non-STEM researchers may understate citation activity.
- Benford's law requires at least 10 cited papers to be statistically meaningful. Early-career researchers with fewer than 10 cited papers will not trigger the citation distribution check. Scores for these researchers are computed from the other three models only.
- Paper mill template detection uses 5-word title prefix matching. Legitimate research programmes such as multi-paper clinical trial series with consistent naming conventions can produce false positives. Always read the
signalsarray to assess whether the flagged pattern reflects the researcher's actual methodology. - NIH grant data covers US federal funding only. Researchers primarily funded by European, Asian, or private foundation grants will receive a "no grants found" penalty. Use the
topicparameter inaudit_grant_research_linkto reduce false positives for international researchers. - ORCID completeness scoring penalises sparse profiles. Some senior researchers maintain sparse ORCID profiles by choice. A low ORCID sub-score alone should not drive a negative verdict without corroborating signals.
- This tool detects statistical anomalies, not fraud. Anomalous patterns have legitimate explanations. The output is a screening signal for human review, not a forensic determination.
- Query resolution depends on name disambiguation. Common researcher names without institutional context return mixed results from multiple individuals. Always include institution name or ORCID ID for definitive identification.
- Child actor timeout is 120 seconds. If a data source is slow, it times out and returns an empty array for that source. The composite score is computed on available data; missing sources reduce confidence in low-risk verdicts.
Integrations
- Apify API — trigger integrity screenings programmatically from grant management systems or editorial workflow software
- Webhooks — push HIGH_RISK verdicts to Slack, email, or case management tools the moment a screening completes
- Zapier — connect to Airtable or Google Sheets grant trackers; auto-log integrity scores when new applicants are added
- Make — build editorial submission workflows that auto-route papers to integrity review queues based on the mill level verdict
- Google Sheets — export batch researcher screening results to shared spreadsheets for committee review
- LangChain / LlamaIndex — embed research integrity screening as a tool in LLM agent pipelines for automated due diligence workflows
Troubleshooting
Composite score is 0 despite querying a well-known researcher. This usually means the researcher's name did not resolve correctly across the data sources and all actors returned empty datasets. Try adding the institution name to the query. For common names (e.g. "Wei Zhang"), add the specific field ("Wei Zhang computational biology MIT") or use the ORCID ID directly.
Tool returns "error": true, "message": "Spending limit reached". Your Apify run has hit the maximum charge limit configured for the run. Increase maxTotalChargeUsd in your run configuration, or purchase additional platform credits in the Apify console.
Paper mill score is high for a legitimate systematic review programme. The template detection model flags repeated title prefixes. Systematic reviews and meta-analyses legitimately use series naming conventions ("Systematic review of X in Y: Part 1, 2, 3"). Check the specific pattern in signals — if it matches the researcher's known methodology, contextualise the flag as a false positive in your assessment.
audit_grant_research_link shows no grants for a funded researcher. NIH data covers only US federal grants. Non-NIH funding (NSF, DOD, DARPA, private foundations, international funders) will not appear. Add a topic parameter to improve NIH search relevance and note the limitation in your assessment.
Child actor timeouts causing incomplete results. Empty data arrays in the detailed output indicate that one or more source actors timed out. Re-run the same query — subsequent runs typically succeed. If a specific source consistently times out, treat the generate_integrity_report result as partial and note which sources were unavailable.
Responsible use
- All data accessed by this server comes from publicly available academic databases and federal grant records.
- Research integrity screening results are statistical indicators, not forensic findings. Do not use scores as the sole basis for adverse employment, funding, or publication decisions without independent expert review.
- Comply with applicable data protection regulations when storing or sharing screening outputs that include personal information about researchers.
- Do not use this tool to harass, defame, or discriminate against researchers based on screening scores alone.
- For guidance on web scraping and data use legality, see Apify's guide.
FAQ
How does research integrity screening with Benford's law work?
Benford's law predicts that in naturally occurring numerical datasets, the digit 1 appears as the leading digit about 30.1% of the time, digit 2 about 17.6%, down to digit 9 at 4.6%. Citation counts across a large publication set follow this distribution naturally. When a researcher's citation counts deviate significantly — particularly when digit 1 appears far less than expected — it can indicate artificial inflation of specific citation counts. The detect_citation_anomalies tool returns the per-digit observed vs. expected comparison so your team can assess the magnitude of deviation for each digit independently.
What is a paper mill and how does research integrity screening detect it?
Paper mills are commercial services that produce fake or plagiarised academic manuscripts for sale to researchers who need publication credits. Their output tends to share structural fingerprints: repeated title templates across papers, excessive concentration in a small number of accepting journals, and the same tight author groups appearing across many unrelated papers. The check_publication_flags tool detects all three patterns using title prefix frequency analysis (5-word prefixes appearing 3+ times), journal concentration scoring (above 50% in one journal), and author-set uniqueness ratios (below 30% unique across 10+ papers).
Can research integrity screening definitively identify academic fraud?
No. The MCP identifies statistical anomalies and red flags associated with integrity concerns. Abnormal patterns have legitimate explanations — a researcher who publishes exclusively in one journal may be the editor of that journal, or may work in a narrow field with few suitable venues. Use the output to prioritise human review, not to substitute for it. The requiredActions field identifies the specific checks a human reviewer should perform next.
How accurate is the publication velocity check? The velocity model flags any year where a researcher's publication count exceeds 30 papers, and flags year-over-year increases of 3x or more with at least 10 papers in the latter year. These thresholds are calibrated against typical academic output (most researchers publish 3-8 papers per year). A single high-output year is possible for researchers leading large collaborative projects or clinical trial consortia. Consistent multi-year high velocity is a stronger signal.
How long does a typical research integrity screening take?
The screen_researcher_integrity and check_publication_flags tools query 3-4 data sources in parallel and typically complete in 60-90 seconds. The generate_integrity_report tool queries all 7 sources in parallel and typically completes in 90-150 seconds, depending on API response times from the underlying academic databases.
How many researchers can I screen in one session? There is no hard per-session limit. For batch screening, run tool calls sequentially or in parallel depending on your client's concurrency. Each call is independently priced at $0.045. For screening 200 or more researchers, consider scheduling Apify runs via the API and processing results asynchronously.
How is research integrity screening different from iThenticate or Turnitin? iThenticate and Turnitin detect textual plagiarism by comparing manuscript text against known sources. This MCP does not analyse manuscript text — it analyses publication metadata, citation patterns, grant records, and researcher profiles. The two approaches are complementary: use plagiarism detection for submitted manuscripts and this MCP for researcher-level pattern analysis and pre-award screening.
Does research integrity screening detect conflicts of interest? Not directly. The funding risk model identifies grant-publication linkages and funding concentration, which can reveal financial relationships. For explicit conflict of interest screening — including industry payments, equity holdings, or consulting relationships — combine with financial disclosure databases. The SEC EDGAR Filing Analyzer can identify researchers with disclosed financial interests in publicly traded companies.
Is it legal to use this tool for researcher screening? All underlying data sources are publicly available: OpenAlex, ORCID, PubMed, Crossref, CORE, and NIH Grants are open academic databases. Accessing and analysing public records for research integrity purposes is a standard practice in academic administration and grant management. See Apify's guide on web scraping legality for broader context.
Can I schedule research integrity screening to run periodic sweeps? Yes. Use the Apify Scheduler to trigger the actor on a daily, weekly, or monthly cadence. Configure a webhook to push HIGH_RISK verdicts to your notification system. This is useful for ongoing monitoring of a research portfolio — alerting when a previously CLEAR researcher's score moves to INVESTIGATION_NEEDED after new publications appear.
What happens when a data source returns no results? The scoring functions handle empty arrays gracefully — each source defaults to an empty array when its actor times out or returns no matches. Scores are computed on available data. When ORCID returns nothing, the completeness model adds a 10-point penalty to the integrity sub-score and emits a "No ORCID profile found" signal. Treat results where multiple sources returned empty with more caution than results backed by all seven sources.
How is research integrity screening different from running each academic database search manually? Manual cross-database research takes 2-3 hours per researcher. This MCP queries all seven sources in parallel in under 2 minutes, applies four scoring algorithms automatically, and returns machine-readable structured output that your AI agent can reason over. It eliminates transcription errors between databases and produces consistent, comparable scores across every researcher you screen.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom solutions or enterprise integrations, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.
Ready to try Research Integrity Screening MCP Server?
Start for free on Apify. No credit card required.
Open on Apify Store