Automated deduplication of literature searches for systematic reviews, scoping reviews, and meta-analyses. Upload files from multiple databases and receive a single deduplicated file.
Processing, please wait…
Methods text
⚠ Submitted for peer review. This tool has been submitted for peer review and is not yet formally published. Use at your own discretion — the source code is transparently available at github.com/dpurkarthofer/deduplicate.it.
Deduplication is based on exact DOI and title matching. References without a valid DOI, or without an exactly matching title, pass through unchanged — this is by design. Manual review of both the deduplicated output and the exclusion file is required.
Frequently asked questions
References are matched by exact equality of both their normalised DOI
and normalised title. DOI normalisation follows the
DOI Handbook
(ISO 26324 §3.4–3.8). Title normalisation applies a reproducible pipeline: HTML
entity decoding, trademark removal, NFC, extended Latin transliteration
(ä→a, æ→ae, ß→ss…), NFD with diacritic stripping, Greek expansion
(α→alpha…), lowercase, and reduction to [a-z0-9 ].
When several references share a DOI and normalised title, the one with an abstract
is kept; otherwise the reference from the first uploaded file wins, with
abstract length as the final tiebreaker.
When the same DOI maps to references with different normalised titles
(DOI collision — common with conference abstract supplements),
those references are not merged and appear at the top of the output for manual review.
You can edit the flowchart directly in your browser by clicking on it and then selecting the pen icon in the context menu that appears at the bottom of the diagram. Alternatively, download the flowchart as an HTML file using the download button above and open it with the draw.io web app or the draw.io desktop application.
The algorithm and validation are described in the accompanying methodology paper, currently submitted for peer review (citation to be added upon publication). The full source code is openly available for inspection and reuse at github.com/dpurkarthofer/deduplicate.it.