Duplicate Records

Many 'works' (an article, presentation etc.) in Citebase are duplicated across or within the source archives. This causes problems for users when searching, as the same work may appear multiple times, and causes problems for identifying the true citation and download impact of works (within the limits of Citebase).

Records are harvested from source archives and do not contain specific metadata to identify them as being duplicates of other records. Therefore Citebase uses a simple rule to de-duplicate (tie multiple records together as a single work). Records are determined to be a single work if they share the same first author and a similar title.

Citation and Full-text Download Totals

The total number of citations to and downloads of a work are inherited by all duplicates of that work. These totals are used when ranking search results, and are displayed in the summary table on the abstract page.

Citations from Duplicates

Without identifying duplicates references from the same work may be counted multiple times, or the same reference may be incorrectly linked to multiple works. This will result in the citation impact of each record being over-stated.

To more accurately count citations (a link between works) Citebase counts multiple repeated references from the same work only once.

Downloads of Duplicates

Where a work is available from multiple sources user access will be spread across those instances of the work. While a work appearing in many places may increase its download impact, each record will represent only a part of the total download impact. This will result in the download impact of each record being under-stated.

Citebase includes all downloads, to all duplicates, as the total full-text downloads for the work.