How much does Crossref Academic Paper Search cost?

Crossref Academic Paper Search uses pay-per-event pricing at $0.002 per paper-fetched. For example, 100 events cost $0.20 and 1,000 events cost $2.00. You only pay for what you use — there are no monthly fees.

How do I use Crossref Academic Paper Search?

Configure your parameters in the Apify Console or pass them via API, then click Start or trigger via API/webhook. Results are available as JSON, CSV, or Excel, and integrate with 1,000+ apps via Apify integrations. Each run costs $0.002 per paper-fetched.

Is Crossref Academic Paper Search reliable?

Crossref Academic Paper Search has a maintenance pulse score of 90/100, with 8 builds in the last 30 days and the most recent build today.

What output format does Crossref Academic Paper Search return?

Crossref Academic Paper Search returns structured data in JSON format by default. You can also export results as CSV or Excel from the Apify Console. Each result includes all extracted fields in a flat, machine-readable structure that integrates directly with spreadsheets, CRMs, and automation tools via Apify integrations.

Are there alternatives to Crossref Academic Paper Search?

Yes. ApifyForge lists multiple actors in each category with different strengths. Browse related actors on the Crossref Academic Paper Search page or use the ApifyForge actor recommender to find the best fit for your use case. The right choice depends on your input data, budget, and required output fields.

DEVELOPER TOOLSOTHER

Crossref Academic Paper Search

Crossref Academic Paper Search is an Apify actor available on ApifyForge at $0.002 per paper-fetched. Search 150M+ scholarly papers via Crossref API. Filter by keywords, author, journal, DOI prefix, publication type, and year range. Returns DOIs, citations, authors with ORCID, abstracts, funding data, and publisher metadata. Free, no API key needed.

Best for teams who need automated crossref academic paper search data extraction and analysis.

Not ideal for use cases requiring real-time streaming data or sub-second latency.

Try on Apify Store

$0.002per event

Last verified: March 27, 2026

Actively maintained

Maintenance Pulse

$0.002

Per event

What to know

Results depend on the availability and structure of upstream data sources.
Large-scale runs may be subject to platform rate limits.
Requires an Apify account — free tier available with limited monthly usage.

Maintenance Pulse

90/100

Last Build

Today

Last Version

1d ago

Builds (30d)

Issue Response

N/A

Cost Estimate

How many results do you need?

paper-fetcheds

Estimated cost:$0.20

Pricing

Pay Per Event model. You only pay for what you use.

Event	Description	Price
paper-fetched	Charged per academic paper record retrieved from Crossref.	$0.002

Example: 100 events = $0.20 · 1,000 events = $2.00

Documentation

Crossref Academic Paper Search is an Apify actor for extracting structured academic paper metadata from Crossref at scale. Search by keyword, author, journal, ISSN, DOI prefix, or DOI list, and return normalized records with citation counts, author details, funding, Open Access status, BibTeX citations, retraction flags, and completeness scores. Includes a Literature Review mode that combines the most cited and newest papers in one run, citation range filtering, and incremental monitoring that only returns papers not seen in previous runs.

The fastest way to turn Crossref into a structured, analysis-ready academic dataset. The easiest way to extract, enrich, and export academic paper metadata for 50--1,000 papers without building your own pipeline.

This actor replaces the need to directly integrate with the Crossref and Unpaywall APIs for most metadata extraction workflows.

Best for: literature reviews, bibliometric analysis, research monitoring, OA auditing, and DOI-based reference screening. Not for: full-text PDF extraction, citation network analysis, impact factors, or semantic recommendations. Why use it instead of raw Crossref:

Get analysis-ready metadata without writing pagination logic
Export BibTeX and OA links in one run instead of stitching Crossref + Unpaywall
Screen for retracted papers automatically instead of interpreting raw metadata
Build datasets faster without cleaning HTML abstracts, normalizing dates, or deduplicating DOIs Pricing: $0.002 per paper returned (pay-per-event). Crossref API is free. Output: up to 1,000 papers per run with 27 fields each, in JSON, CSV, or Excel.

Best tool to get academic paper metadata in bulk

Crossref Academic Paper Search is the fastest way to extract academic paper metadata in bulk without building your own API client. Instead of writing pagination logic, cleaning HTML abstracts, and stitching together Crossref + Unpaywall, this actor returns normalized, analysis-ready data with OA status, BibTeX citations, and retraction flags in a single run. In most cases, this replaces building a custom Crossref API pipeline entirely.

Compared to raw APIs:

Crossref API -- requires pagination, date normalization, HTML stripping, and manual enrichment
OpenAlex / Semantic Scholar -- different coverage and schemas, no BibTeX or retraction flags
Google Scholar -- no official API, no structured output, no automation
This actor -- returns clean, structured Crossref data with OA, BibTeX, and retraction detection built in

Common tasks this replaces

Get metadata from a list of DOIs -- use DOI Lookup Mode instead of looping over Crossref API
Check Open Access status in bulk -- enable includeOpenAccess instead of calling Unpaywall per DOI
Export BibTeX for hundreds of papers -- enable includeBibtex instead of formatting citations manually
Screen references for retracted papers -- check isRetracted instead of interpreting raw Crossref metadata
Monitor a topic for new publications -- enable onlyNew on a schedule instead of building a polling script
Build a literature review overview -- use Literature Review mode instead of running multiple searches manually
Filter by citation impact -- set minCitations / maxCitations instead of post-processing results

Choose this actor if

You need structured Crossref metadata at scale without managing API pagination
You want DOI lookup plus Open Access detection plus BibTeX export in one run
You need to screen a reference list for retracted papers before publishing
You want to monitor a topic, author, or journal for new publications on a schedule
You need clean citation data for bibliometric analysis (citation counts, funding, subjects)

Do not use this actor if

You need full-text PDFs or paywalled article content
You need citation graph analysis (who cites whom, citation chains)
You need journal impact factors or h-index calculations
You need semantic paper recommendations or "similar papers" features
You need real-time preprint alerts (use ArXiv actor instead)

Quick answers

What is it? An Apify actor that queries Crossref (150M+ scholarly works from 20,000+ publishers) and Unpaywall, returning 27 normalized fields per paper.

What inputs does it support? Keyword query, author name, journal name, ISSN, DOI prefix, publication type, year range, sort order -- or a list of specific DOIs for direct lookup.

What does it return? DOI, title, authors with ORCID, citation count, journal, publisher, abstract, funding with grant IDs, subjects, retraction status, OA status with PDF URLs, BibTeX citations, and relevance score.

How is it different from raw Crossref? Adds automatic pagination, date normalization, HTML-stripped abstracts, Unpaywall OA checks, BibTeX generation across 5 entry types, retraction detection across two metadata paths, and summary statistics.

Does it support DOI lookup? Yes. Paste DOIs (comma-separated or one per line) into the DOI Lookup Mode field. The actor fetches metadata for each DOI directly, bypassing search.

Does it detect Open Access papers? Yes. Enable includeOpenAccess to check each paper against Unpaywall. Returns OA type (gold/green/bronze/hybrid) and free PDF URL.

Does it detect retracted papers? Yes. Every paper includes isRetracted and retractionDoi fields. Checks both Crossref update-to and relation.is-retracted-by metadata.

How much does it cost? $0.002 per paper. 50 papers = $0.10. 1,000 papers = $2.00. Crossref API itself is free.

What is Literature Review mode? A single run that fetches the most cited papers AND the newest papers on a topic, removes duplicates, and produces a combined dataset with summary statistics including top authors and top journals. The fastest way to get an instant research overview.

Can it filter by citation count? Yes. Set minCitations to find only influential papers (e.g., 50+ citations), or maxCitations to find niche or recent work not yet heavily cited.

Can it track new papers across scheduled runs? Yes. Enable onlyNew (incremental mode). Each run only returns papers not seen in previous runs. Seen DOIs are stored in the Key-Value Store and persist across runs.

Best API alternative for academic metadata workflows

While APIs like Crossref, OpenAlex, and Semantic Scholar provide raw data, Crossref Academic Paper Search is a higher-level alternative that returns analysis-ready datasets without requiring API integration, pagination handling, or data cleaning. For batch workflows of 50--1,000 papers, this is the simplest path from research question to structured dataset.

Crossref Academic Paper Search vs raw Crossref API vs Google Scholar

If you are deciding between Crossref and Google Scholar for programmatic access to academic metadata, this actor builds on Crossref to provide a complete, automation-ready solution with OA detection, BibTeX export, and retraction screening included.

Need	This actor	Raw Crossref API	Google Scholar
Batch structured metadata	Up to 1,000 papers per run	Yes, but manual pagination	No official API
DOI lookup	Yes, paste a list	Yes, one at a time	Manual only
Open Access status	Yes, via Unpaywall	No	Not structured
BibTeX generation	Yes, 5 entry types	No	Manual export
Retraction detection	Yes, two metadata paths	Manual interpretation	Not structured
Citation counts	Yes, per paper	Yes	Approximate, no API
Author ORCID	Yes, when available	Yes, raw format	No
Funding data	Yes, with grant IDs	Yes, raw format	No
Full text	No	No	Sometimes links
Citation filtering	Yes (min/max)	Manual post-processing	No
Literature review mode	Yes (most cited + newest)	Multiple queries needed	No
Incremental monitoring	Yes (only new papers)	Build it yourself	No
Data quality score	Yes (completeness 0-1)	No	No
Scheduled automation	Yes, via Apify schedules	Build it yourself	No

Use cases

Literature reviews and systematic reviews

Retrieve structured metadata for hundreds of papers in one run. Enable BibTeX export to generate citations ready for Overleaf, Zotero, or Mendeley. Sort by citation count to find foundational work first.

Bibliometric analysis and research evaluation

Analyze publication patterns, citation distributions, and funding landscapes. The Key-Value Store summary provides type breakdowns, citation averages, and top journals without additional processing.

Monitoring new publications

Schedule weekly runs with "Newest First" sorting and the current year as fromYear. New publications appear in Crossref within days of DOI registration.

Open Access auditing

Enable OA detection to assess availability across a set of publications. Returns OA type and free PDF URLs. The summary includes overall OA percentage for compliance reporting.

Retraction screening

Validate a reference list or dataset for retracted papers. Use DOI lookup mode with DOIs from an existing bibliography. Every paper shows isRetracted status and the retraction notice DOI.

Pricing and performance

Scenario	Papers	Cost	Run time
Quick test	10	$0.02	~5 seconds
Standard search	50	$0.10	8-15 seconds
Author bibliography	200	$0.40	15-30 seconds
Full extraction	1,000	$2.00	45-90 seconds
100 papers + OA check	100	$0.20	2-4 minutes

The actor respects your Apify spending limit. If the limit is reached mid-run, it stops and returns papers collected so far.

How to use

Enter a search query -- type a topic like "CRISPR gene editing" or paste DOIs into the DOI Lookup Mode field
Add filters -- optionally set author, journal, ISSN, DOI prefix, type, or year range. Enable BibTeX or Open Access under Output Enrichment
Run -- 50 papers completes in ~10 seconds
Download -- export from the Dataset tab in JSON, CSV, or Excel. Summary stats are in the Key-Value Store under SUMMARY

First run tips

Start with 50 results -- scale up after reviewing the first batch
Use ISSN for exact journal matching -- issn: "0028-0836" targets only Nature, while containerTitle: "Nature" fuzzy-matches Nature Communications, Nature Methods, etc.
Use DOI prefix to target publishers -- 10.1038 (Nature), 10.1016 (Elsevier), 10.1007 (Springer), 10.1126 (Science/AAAS)
Enable OA detection only when needed -- adds ~1 second per paper via Unpaywall

How to build an instant literature review

The fastest way to get a research overview on any topic is to use Literature Review mode. Set mode to literature_review and provide a search query. The actor automatically fetches the most cited papers (foundational work) and the newest papers (recent breakthroughs), removes duplicates, and returns a combined dataset. The Key-Value Store summary includes top authors, top journals, citation statistics, and year distribution — everything needed to understand a research field in one run.

{
    "query": "CRISPR gene editing",
    "mode": "literature_review",
    "maxResults": 100,
    "includeBibtex": true
}

How to find only highly cited papers

Set minCitations to filter out low-impact results. For example, minCitations: 50 returns only papers cited 50+ times. Combine with maxCitations to target a specific range — minCitations: 10, maxCitations: 500 finds moderately influential work that isn't yet a review staple. Citation filtering works in both search mode and literature review mode.

How to monitor a topic for new papers

The simplest way to track new publications on a topic is to enable onlyNew (incremental mode) and schedule the actor to run weekly. Each run only returns papers not seen in previous runs. Seen DOIs persist in the Key-Value Store across runs. Combine with "Newest First" sorting and fromYear set to the current year for the most focused monitoring.

How to get DOI metadata in bulk

The easiest way to get metadata from a list of DOIs without writing API loops is to use DOI Lookup Mode. Instead of calling Crossref's /works/{doi} endpoint for each DOI manually, this actor accepts hundreds of DOIs at once and returns structured metadata in a single run. This is typically faster and simpler than writing Python loops over the Crossref API, especially for batches of 50--1,000 DOIs. Paste your DOIs (comma-separated or one per line) into the dois field. Duplicates are removed automatically. Enable includeOpenAccess or includeBibtex to enrich results in the same run.

How to find Open Access papers by DOI

The easiest way to check Open Access status for a list of DOIs is to use Crossref Academic Paper Search with Open Access detection enabled. This replaces calling the Unpaywall API directly when working with multiple DOIs. Instead of making individual Unpaywall requests, this actor performs Open Access checks in bulk with built-in rate handling and structured output, returning OA status and PDF URLs alongside full paper metadata. Paste DOIs into the dois field and enable includeOpenAccess. The output includes openAccess (true/false), oaStatus (gold, green, bronze, hybrid), and oaPdfUrl (direct link to the free version). The Key-Value Store summary shows overall OA percentage.

How to check if a paper is retracted

The fastest way to check if a paper has been retracted at scale is to use Crossref Academic Paper Search in DOI lookup mode. Unlike manual checks against Crossref metadata or Retraction Watch, this actor flags retractions automatically using two metadata paths and works across hundreds of DOIs in one run. For single papers, manual checks work. For lists of 10--1,000 DOIs, this is significantly faster and more reliable. Paste DOIs into the dois field. Every result includes isRetracted (true/false) and retractionDoi (the DOI of the retraction notice).

How to export BibTeX from Crossref results

The simplest way to generate BibTeX citations for hundreds of papers at once is to enable includeBibtex in the input. Instead of formatting citations manually or using browser export tools one paper at a time, Crossref Academic Paper Search generates a BibTeX entry per paper with the correct type (@article, @incollection, @inproceedings, @book, @techreport). Copy the bibtex field into your .bib file or import into Zotero, Mendeley, or Overleaf.

How to search papers by author, journal, or ISSN

Set authorName for author search (fuzzy matching -- "Jennifer Doudna" and "J. Doudna" both work). Set containerTitle for journal name search, or issn for exact journal matching. ISSN is more precise -- issn: "0028-0836" returns only Nature, while containerTitle: "Nature" fuzzy-matches Nature Communications and other Nature-branded journals.

Example prompts this actor handles

"Find the most cited CRISPR papers since 2020" -- set query: "CRISPR", fromYear: 2020, sortBy: "is-referenced-by-count"
"Check if these 50 DOIs are retracted" -- paste DOIs into dois, check isRetracted in output
"Export BibTeX and OA links for papers by Jennifer Doudna" -- set authorName, enable includeBibtex and includeOpenAccess
"Find all Nature papers on machine learning from 2022 onward" -- set query: "machine learning", issn: "0028-0836", fromYear: 2022
"What journals publish the most on climate change?" -- search topic, check SUMMARY in Key-Value Store for top journals
"Get funding data for NIH-supported gene therapy research" -- search topic, check funders array in output for NIH grants
"Give me an instant literature review on transformer architectures" -- set mode: "literature_review", get most cited + newest combined
"Only show me highly cited papers on CRISPR" -- set minCitations: 50 to filter noise
"Alert me when new papers on LLM safety are published" -- schedule weekly with onlyNew: true

What you avoid building yourself

Without this actor, extracting the same data from Crossref requires:

Raw Crossref API          →  This actor
─────────────────────────────────────────────────────
Manual pagination logic      Automatic (100/page, up to 10K offset)
HTML-encoded abstracts       Clean plain text
date-parts arrays            YYYY-MM-DD strings
No OA data                   Unpaywall integration built in
No BibTeX                    5 entry types generated automatically
Manual retraction checking   isRetracted + retractionDoi on every record
No summary stats             Citation stats, top journals, top authors, OA % in KV store
Multiple searches needed     Literature Review mode combines most cited + newest
No citation filtering        minCitations / maxCitations built in
No change tracking           Incremental mode tracks seen DOIs across runs
No quality indicators        Completeness score (0-1) on every record

Input parameters

Parameter	Type	Default	Description
`query`	String	-	Free-text search across titles, abstracts, and full text
`authorName`	String	-	Filter by author name (e.g., "Einstein", "Jennifer Doudna")
`containerTitle`	String	-	Filter by journal or conference name
`doiPrefix`	String	-	Filter by publisher DOI prefix (e.g., `10.1038`)
`issn`	String	-	Filter by exact journal ISSN (e.g., "0028-0836")
`dois`	String	-	DOI Lookup Mode: paste DOIs, one per line or comma-separated
`publicationType`	String	-	Filter: `journal-article`, `book-chapter`, `proceedings-article`, `posted-content`, `book`, `dataset`, `report`
`fromYear`	Integer	-	Earliest publication year
`toYear`	Integer	-	Latest publication year
`sortBy`	String	`relevance`	Sort: `relevance`, `is-referenced-by-count` (most cited), `published` (newest)
`maxResults`	Integer	`50`	Maximum papers to return (1-1,000)
`minCitations`	Integer	-	Only return papers with at least this many citations
`maxCitations`	Integer	-	Only return papers with at most this many citations
`mode`	String	-	Set to `literature_review` to fetch most cited + newest papers combined
`onlyNew`	Boolean	`false`	Incremental mode: only return papers not seen in previous runs
`includeBibtex`	Boolean	`false`	Generate BibTeX citation for each paper
`includeOpenAccess`	Boolean	`false`	Check Unpaywall for OA status and free PDF URLs
`includeRis`	Boolean	`false`	Generate RIS-format citation per paper (EndNote/Zotero/Mendeley import)
`outputProfile`	String	`full`	Output verbosity: `minimal` (decision-only — doi/title/year/citations/summary/recommendedAction/changeFlag), `standard` (above + author + journal + abstract), `llm` (decision + confidence + agentContract for AI consumers), `full` (every field)
`watchlistName`	String	-	Name this run as a separate watchlist. CITATION_HISTORY + SEEN_DOIS are stored per-watchlist, so the same actor runs as N independent literature reviews.
`webhookUrl`	String	-	Slack or Discord incoming webhook URL. Posts a rich embed on completion with totals + retractions + OA % + top papers + a link to the run. Auto-detects vendor.
`circuitBreakerThreshold`	Integer	`0`	Reserved for future Unpaywall failure-streak abort. Currently a placeholder.
`includeAgentContract`	Boolean	`true`	Add a top-level `agentContract` `{ decision, confidence, nextAction, costToAct }` to every paper record (and run-level on the SUMMARY) for MCP/AI-agent consumers.

At least one of query, authorName, containerTitle, doiPrefix, issn, or dois must be provided.

Input examples

Find the most cited CRISPR papers with BibTeX:

{
    "query": "CRISPR gene editing",
    "sortBy": "is-referenced-by-count",
    "maxResults": 100,
    "includeBibtex": true
}

Check whether these DOIs are retracted and Open Access:

{
    "dois": "10.1126/science.aaf5573\n10.1038/nature17946\n10.1016/j.cell.2014.09.029",
    "includeOpenAccess": true
}

Find all Nature papers on machine learning from 2022 onward:

{
    "query": "machine learning",
    "issn": "0028-0836",
    "fromYear": 2022,
    "sortBy": "published",
    "maxResults": 200
}

Export BibTeX and OA links for papers by Jennifer Doudna:

{
    "query": "base editing",
    "authorName": "Jennifer Doudna",
    "sortBy": "is-referenced-by-count",
    "includeBibtex": true,
    "includeOpenAccess": true
}

Output example

{
    "doi": "10.1126/science.aaf5573",
    "url": "http://dx.doi.org/10.1126/science.aaf5573",
    "title": "Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage",
    "publishedYear": 2016,
    "publishedDate": "2016-04-20",
    "type": "journal-article",
    "citationCount": 3842,
    "referencesCount": 47,
    "authors": "Alexis C. Komor, Yongjoo B. Kim, Michael S. Packer, John A. Zuris, David R. Liu",
    "authorDetails": [
        {
            "name": "Alexis C. Komor",
            "sequence": "first",
            "affiliations": ["Harvard University", "Broad Institute"],
            "orcid": "https://orcid.org/0000-0003-4884-3253"
        }
    ],
    "journal": "Science",
    "publisher": "American Association for the Advancement of Science (AAAS)",
    "volume": "352",
    "issue": "6293",
    "page": "1423-1428",
    "language": "en",
    "issn": ["0036-8075", "1095-9203"],
    "subjects": ["Multidisciplinary"],
    "funders": [
        { "name": "National Institutes of Health", "awards": ["R01 EB022376"] },
        { "name": "Howard Hughes Medical Institute", "awards": [] }
    ],
    "abstract": "Current genome-editing technologies introduce double-stranded (ds) DNA breaks at a target locus...",
    "license": "https://www.science.org/doi/am-pdf/10.1126/science.aaf5573",
    "isRetracted": false,
    "retractionDoi": null,
    "openAccess": true,
    "oaStatus": "green",
    "oaPdfUrl": "https://europepmc.org/articles/pmc4873371?pdf=render",
    "bibtex": "@article{Liu2016,\n  author = {Alexis C. Komor and ...},\n  title = {Programmable editing of...},\n  journal = {Science},\n  year = {2016},\n  doi = {10.1126/science.aaf5573}\n}",
    "relevanceScore": 18.742,
    "extractedAt": "2026-04-04T14:30:00.000Z"
}

Output fields

Field	Type	Description
`doi`	String	Digital Object Identifier
`url`	String	Canonical URL (via doi.org)
`title`	String	Full title
`publishedYear`	Integer / null	Publication year
`publishedDate`	String / null	Date in YYYY-MM-DD
`type`	String	Crossref type (journal-article, book-chapter, etc.)
`citationCount`	Integer	Times cited by indexed works
`referencesCount`	Integer	References this work cites
`authors`	String	Comma-separated author names
`authorDetails`	Array	Name, sequence, affiliations, ORCID per author
`journal`	String / null	Journal or container title
`publisher`	String	Publisher name
`volume`	String / null	Volume
`issue`	String / null	Issue
`page`	String / null	Page range
`language`	String / null	ISO language code
`issn`	Array	Journal ISSNs
`subjects`	Array	Subject classifications
`funders`	Array	Funder name + grant IDs
`abstract`	String / null	Plain-text abstract (HTML stripped)
`license`	String / null	License or access URL
`isRetracted`	Boolean	Whether the paper is retracted
`retractionDoi`	String / null	DOI of retraction notice
`openAccess`	Boolean / null	OA status (null if not checked)
`oaStatus`	String / null	gold, green, bronze, hybrid
`oaPdfUrl`	String / null	Free PDF URL
`bibtex`	String / null	BibTeX citation (null if not enabled)
`ris`	String / null	RIS-format citation for EndNote/Zotero/Mendeley (null if `includeRis=false`)
`completenessScore`	Number	Data quality score (0-1) based on available metadata
`relevanceScore`	Number	Crossref relevance score
`extractedAt`	String	ISO 8601 extraction timestamp
`recordType`	String	`result` for paper records, `summary` for run summary record, `error` for failures. Use to filter the dataset.
`schemaVersion`	String	Output schema version (semver). Bumped on shape changes; safe to branch on.
`eventId`	String	Idempotent canonical id `sha256(watchlist::doi)`. Same id across re-runs of the same DOI — safe join key for downstream diffing.
`summary`	String	Plain-English one-line summary (≤280 chars). LLM/CRM-friendly.
`confidence`	Object	`{ score: 0–1, level: 'high'\|'medium'\|'low'\|'very-low', components: [{ name, weight, value }] }`. Components: completeness, citationStrength, recordIntegrity, recency.
`recommendedAction`	String	Stable enum: `cite-immediately` \| `read-later` \| `verify-retraction-status` \| `manual-review` \| `archive-low-completeness`.
`changeFlag`	String	Cross-run change vs prior `citationCount`: `NEW` \| `IMPROVED` \| `DECLINED` \| `UNCHANGED` (±5 tolerance).
`previousCitationCount`	Integer / null	Citation count from the prior run (null on first encounter).
`citationDelta`	Integer / null	`citationCount - previousCitationCount`. Positive = paper gained citations since last run.
`dataGaps`	Array	`[{ field, reason, suggestedFix }]` listing missing fields with named upstream actors that fill the gap.
`agentContract`	Object	`{ decision, confidence, nextAction, costToAct }` decision surface for MCP and AI-agent consumers.

KV store mirrors

In addition to the dataset, every run writes to KV:

SUMMARY key — totals + analytics (typeBreakdown, citationStats, topJournals, topAuthors, yearDistribution) + run-level agentContract decision surface + coverage block + dataGaps (when applicable). Best for triggering downstream actors with a single read.
OUTPUT key — full deterministic per-paper output (regardless of outputProfile) plus agentContract and coverage. Use when you need every field even though the dataset is filtered.
crossref-paper-search-history[-watchlistName] named store — CITATION_HISTORY map of { doi → citationCount } for changeFlag computation, plus SEEN_DOIS for onlyNew incremental mode. Survives dataset purges.

Stable enums

The actor commits to additive-only evolution of these enums (new values may be added in minor versions; existing values never removed or renamed):

recommendedAction — cite-immediately, read-later, verify-retraction-status, manual-review, archive-low-completeness
changeFlag — NEW, IMPROVED, DECLINED, UNCHANGED
confidence.level — high (≥0.8), medium (≥0.6), low (≥0.4), very-low (<0.4)
agentContract.decision — qualified-A, qualified-B, review, low-priority, reject
recordType — result, summary, error
failureType (error records) — invalid-input, no-data, unknown

Branching on these in Dify, n8n, Make, or your own code is safe across schemaVersion minor bumps.

Programmatic access

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/crossref-paper-search").call(run_input={
    "query": "CRISPR gene editing",
    "sortBy": "is-referenced-by-count",
    "maxResults": 100,
    "includeBibtex": True,
    "includeOpenAccess": True,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} — {item['citationCount']} citations — OA: {item['openAccess']}")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("ryanclinton/crossref-paper-search").call({
    query: "CRISPR gene editing",
    sortBy: "is-referenced-by-count",
    maxResults: 100,
    includeBibtex: true,
    includeOpenAccess: true,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
    console.log(`${item.title} — ${item.citationCount} citations — OA: ${item.openAccess}`);
}

cURL

curl -X POST "https://api.apify.com/v2/acts/ryanclinton~crossref-paper-search/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "CRISPR gene editing", "sortBy": "is-referenced-by-count", "maxResults": 50}'

curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

How it works

input (query | author | journal | issn | doiPrefix | dois | mode=literature_review)
       │
       ▼
   build URL params (filters + sort + offset)
       │
       ▼
   Crossref /works (paginate 100/page, retry-with-backoff on 429/5xx)
       │
       ▼
   transform → completeness · retraction · authors · funders · subjects · license
       │
       ▼
   citation filters (min/max) + onlyNew (named cross-run KV)
       │
       ▼
   sort by richness (abstract + authors + journal + citations + subjects + funders)
       │
       ▼
   optional: Unpaywall OA enrichment · BibTeX generation · RIS generation
       │
       ▼
   per-paper premium fields:
     eventId · summary · confidence · recommendedAction · changeFlag · dataGaps · agentContract
       │
       ┌────────┴────────┐
       ▼                 ▼
   dataset            KV SUMMARY (totals + analytics + run-level agentContract)
   per-record         KV OUTPUT (full deterministic shape, all profiles)
                      crossref-paper-search-history[-watchlist]
                        (CITATION_HISTORY · SEEN_DOIS for incremental + changeFlag)
       │
       └─► optional Slack/Discord webhook

Search mode: builds paginated queries using query, query.author, query.container-title as fuzzy parameters and prefix, issn, type, from-pub-date, until-pub-date as exact filters. Fetches 100 records per page until maxResults or the 10,000-offset limit.

DOI lookup mode: fetches each DOI individually from api.crossref.org/works/{doi}. Deduplicates input DOIs automatically.

Open Access detection: queries Unpaywall (api.unpaywall.org/v2/{doi}) for each paper when enabled. Returns OA type and best available PDF URL. Adds ~1 second per paper.

BibTeX generation: maps Crossref type to BibTeX entry type (journal-article -> @article, proceedings-article -> @inproceedings, book-chapter -> @incollection, book -> @book, report -> @techreport). Citation key follows {LastName}{Year}.

Retraction detection: checks two Crossref metadata paths -- update-to (retraction-type updates) and relation.is-retracted-by (direct retraction links). No extra API calls needed.

Limitations

10,000-result deep paging cap -- Crossref API constraint. Use filters to narrow broad queries.
No full-text access -- metadata only. Use doi or url fields to access papers.
20-30% abstract availability -- depends on publisher. Returns null when missing.
Citation count lag -- may trail Google Scholar or Semantic Scholar by weeks.
Metadata completeness varies -- some publishers omit affiliations, ORCID, subjects, or funding.
OA detection adds latency -- ~1 second per paper. 1,000 papers = ~15 minutes extra.
Rate limiting -- retries on HTTP 429 with backoff, but rapid consecutive runs may experience delays.

Combine with other actors

Actor	How to combine
OpenAlex Research Search	Cross-reference with OpenAlex for institutional data and open-access metadata
PubMed Biomedical Literature Search	Add MeSH terms and clinical trial data for biomedical papers
Semantic Scholar Paper Search	Enrich with citation context and AI-generated TLDRs
ArXiv Preprint Paper Search	Track papers from preprint to publication
CORE Open Access Papers	Supplement with full-text open access content

FAQ

What is the difference between this and Google Scholar? Google Scholar crawls the web and provides a search interface but no structured API. Crossref Academic Paper Search queries the Crossref registry directly (150M+ works from 20,000+ publishers), returns 27 structured fields per paper, supports batch processing, and can be automated via API.

How do I search by author? Enter the author's name in authorName. Crossref uses fuzzy matching, so "Jennifer Doudna" and "J. Doudna" both work. Combine with a journal or keyword for precision.

Can I export BibTeX for Overleaf or Zotero? Yes. Enable includeBibtex. The actor generates a BibTeX entry per paper with correct entry type. Copy the bibtex field into your .bib file or import into Zotero/Mendeley.

Why are some abstracts missing? Only 20-30% of Crossref records include abstracts. The actor returns null for missing fields rather than guessing.

Can I schedule automatic runs? Yes. Use Apify scheduling to run weekly with "Newest First" sorting and fromYear set to the current year.

What publication types are supported? Journal articles, book chapters, conference proceedings, preprints, books, datasets, and reports.

Is it legal to extract metadata from Crossref? Crossref is public, community-funded infrastructure with an API designed for programmatic access. Metadata is factual bibliographic data. Consult legal counsel for specific compliance requirements.

How does it handle missing metadata? Returns null for missing string fields and empty arrays for missing list fields. Results are sorted by completeness so the richest records appear first.

Troubleshooting

No results for a broad query: Crossref needs a query-type parameter. If using only filters (DOI prefix, type, year) without query or authorName, add a keyword.

OA check is slow: Unpaywall allows ~1 request/second. For 1,000 papers that's ~15 minutes. Disable includeOpenAccess when OA data is not needed.

"DOI not found" warnings: Some DOIs are registered with DataCite or other registries, not Crossref. This actor only looks up Crossref-registered DOIs.

BibTeX key conflicts: Keys use {LastName}{Year} format. Two papers with the same last author and year will collide. Rename duplicates in your reference manager.

Help us improve

If you encounter issues, enable run sharing in Account Settings > Privacy so we can see your run details and fix issues faster.

Support

Found a bug or have a feature request? Open an issue in the Issues tab.

Related actors

Bulk Email Verifier — MX, SMTP & Disposable Detection at Scale

Verify email deliverability in bulk — MX records, SMTP mailbox checks, disposable detection (55K+ domains), role-based flagging, catch-all detection, domain health scoring (SPF/DKIM/DMARC), and confidence scores. $0.005/email, no subscription.

$0.005/event

CFPB Complaint Intelligence — Vendor Risk & Screening

Turn 5M+ CFPB consumer complaints into decisions: screen companies pass / review / fail, score complaint-handling risk, monitor what changed since last run, benchmark cohorts, and build audit-ready due-diligence packs. Filter by company, product, state, and date. No API key.

$0.002/event

Company Deep Research — SEC, GitHub, DNS & Social

Research any company from a domain. Get website metadata, Wikipedia summary, GitHub repos & stars, SEC EDGAR filings & ticker, academic papers, DNS records, and social media profiles in one JSON report.

$1.00/event

SEC EDGAR Filing Search & Signal Engine — Risk, Events & Alerts

Search, rank, and monitor SEC EDGAR filings by keyword, company, or ticker. Every result carries an event category (8-K taxonomy), risk flags, and a signal score; watchlists add cross-filing pattern alerts. Filter by form type and date. Structured JSON with direct document links.

$0.002/event

Not sure which actor to pick?

Try the actor recommender

Last verified: March 27, 2026

Ready to try Crossref Academic Paper Search?

Run it on your own Apify account. Apify offers a free tier with $5 of monthly credits.

Open on Apify Store