skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scholarly context not found: One in five articles suffers from reference rot

Journal Article · · PLoS ONE
 [1];  [1];  [1];  [1];  [1];  [2];  [2];  [3]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. The Univ. of Edinburgh, Scotland (United Kingdom)
  3. Bar-llan Univ., (Israel)

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE
OSTI ID:
1201464
Journal Information:
PLoS ONE, Vol. 9, Issue 12; ISSN 1932-6203
Publisher:
Public Library of ScienceCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 57 works
Citation information provided by
Web of Science

References (39)

Accessibility of online resources cited in scholarly LIS journals: A study of Emerald ISI‐ranked journals journal March 2012
404 not found: the stability and persistence of URLs published in MEDLINE journal January 2004
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages journal February 2014
Disappearing act: decay of uniform resource locators in health care management journals journal April 2009
Web page change and persistence?A four-year longitudinal study journal January 2002
Ecology in the information age: patterns of use and attrition rates of internet-based citations in ESA journals, 1997–2005 journal April 2008
Persistence of Web references in scientific research journal March 2001
The web changes everything: understanding the dynamics of web content conference January 2009
Zoetrope: interacting with the ephemeral web conference January 2008
A cross disciplinary study of link decay and the effectiveness of mitigation techniques journal October 2013
Keeping up with the changing Web journal May 2000
The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication journal March 2007
Revisiting Lexical Signatures to (Re-)Discover Web Pages book January 2008
A large-scale study of the evolution of web pages conference January 2003
Librarians and Link Rot: A Comparative Analysis with Some Methodological Considerations journal January 2003
Towards Robust Hyperlinks for Web-Based Scholarly Communication book January 2014
The half-life of internet references cited in communication journals journal October 2007
Profiling web archive coverage for top-level domain and content language journal June 2014
URL decay in MEDLINE--a 4-year follow-up study journal April 2008
INFORMATION SCIENCE: Going, Going, Gone: Lost Internet References journal October 2003
The decay and failures of web references journal January 2003
HTTP Framework for Time-Based Access to Resource States -- Memento report December 2013
Extraction and analysis of referenced web links in large-scale scholarly articles conference September 2014
Special-Use Domain Names report February 2013
Research Objects: Towards Exchange and Reuse of Digital Knowledge journal July 2010
A large-scale study of the evolution of Web pages journal January 2004
arXiv articles summary files dataset January 2014
arXiv Memento data dataset January 2014
Elsevier Memento data dataset January 2014
Part 2: Elsevier articles summary files dataset January 2014
Profiling Web Archive Coverage for Top-Level Domain and Content Language book January 2013
arXiv articles summary files dataset January 2014
arXiv Memento data dataset January 2014
PMC Memento data dataset January 2014
PMC articles summary files dataset January 2014
Live web test data dataset January 2014
PMC Memento data dataset January 2014
Part 2: Elsevier articles summary files dataset January 2014
Part 1: Elsevier articles summary files dataset January 2014

Cited By (17)

Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature journal May 2015
The Dat Project, an open and decentralized research data tool journal October 2018
An open source web application for distributed geospatial data exploration journal February 2019
New Forms of Scholarship and a Serials (R)evolution journal July 2015
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges journal June 2018
Bioboxes: standardised containers for interchangeable bioinformatics software journal October 2015
Identifying PIDs playing FAIR journal November 2019
“As-You-Go” Instead of “After-the-Fact”: A Network Approach to Scholarly Communication and Evaluation journal April 2018
Verified, Shared, Modular, and Provenance Based Research Communication with the Dat Protocol journal June 2019
Cool DOI's text January 2016
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges posted_content January 2018
"As-you-go" instead of "after-the-fact": A network approach to scholarly communication and evaluation posted_content January 2018
"As-you-go" instead of "after-the-fact": A network approach to scholarly communication and evaluation posted_content March 2018
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges text January 2017
The Cochrane Collaboration: institutional analysis of a knowledge commons journal February 2018
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data journal June 2017
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content journal December 2016

Figures / Tables (28)