DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scholarly context not found: One in five articles suffers from reference rot

Abstract

The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growingmore » steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.« less

Authors:
 [1];  [1];  [1];  [1];  [1];  [2];  [2];  [3]
  1. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  2. The Univ. of Edinburgh, Scotland (United Kingdom)
  3. Bar-llan Univ., (Israel)
Publication Date:
Research Org.:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1201464
Resource Type:
Accepted Manuscript
Journal Name:
PLoS ONE
Additional Journal Information:
Journal Volume: 9; Journal Issue: 12; Journal ID: ISSN 1932-6203
Publisher:
Public Library of Science
Country of Publication:
United States
Language:
English
Subject:
96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Information Retrieval

Citation Formats

Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, and Bar-Ilan, Judit. Scholarly context not found: One in five articles suffers from reference rot. United States: N. p., 2014. Web. doi:10.1371/journal.pone.0115253.
Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, & Bar-Ilan, Judit. Scholarly context not found: One in five articles suffers from reference rot. United States. https://doi.org/10.1371/journal.pone.0115253
Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, and Bar-Ilan, Judit. Fri . "Scholarly context not found: One in five articles suffers from reference rot". United States. https://doi.org/10.1371/journal.pone.0115253. https://www.osti.gov/servlets/purl/1201464.
@article{osti_1201464,
title = {Scholarly context not found: One in five articles suffers from reference rot},
author = {Klein, Martin and Van de Sompel, Herbert and Sanderson, Robert and Shankar, Harihar and Balakireva, Lyudmila and Zhou, Ke and Tobin, Richard and Bar-Ilan, Judit},
abstractNote = {The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.},
doi = {10.1371/journal.pone.0115253},
journal = {PLoS ONE},
number = 12,
volume = 9,
place = {United States},
year = {Fri Dec 26 00:00:00 EST 2014},
month = {Fri Dec 26 00:00:00 EST 2014}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 57 works
Citation information provided by
Web of Science

Figures / Tables:

Table 1 Table 1: Number of articles per corpus after each filtering step.

Save / Share:

Works referenced in this record:

Accessibility of online resources cited in scholarly LIS journals: A study of Emerald ISI‐ranked journals
journal, March 2012

  • Sadat‐Moosavi, Ali; Isfandyari‐Moghaddam, Alireza; Tajeddini, Oranus
  • Aslib Proceedings, Vol. 64, Issue 2
  • DOI: 10.1108/00012531211215196

404 not found: the stability and persistence of URLs published in MEDLINE
journal, January 2004


Moved but not gone: an evaluation of real-time methods for discovering replacement web pages
journal, February 2014

  • Klein, Martin; Nelson, Michael L.
  • International Journal on Digital Libraries, Vol. 14, Issue 1-2
  • DOI: 10.1007/s00799-014-0108-0

Disappearing act: decay of uniform resource locators in health care management journals
journal, April 2009

  • Wagner, Cassie; Gebremichael, Meseret D.; Taylor, Mary K.
  • Journal of the Medical Library Association : JMLA, Vol. 97, Issue 2
  • DOI: 10.3163/1536-5050.97.2.009

Web page change and persistence?A four-year longitudinal study
journal, January 2002

  • Koehler, Wallace
  • Journal of the American Society for Information Science and Technology, Vol. 53, Issue 2
  • DOI: 10.1002/asi.10018

Ecology in the information age: patterns of use and attrition rates of internet-based citations in ESA journals, 1997–2005
journal, April 2008

  • Duda, Jeffrey J.; Camp, Richard J.
  • Frontiers in Ecology and the Environment, Vol. 6, Issue 3
  • DOI: 10.1890/070022

Persistence of Web references in scientific research
journal, March 2001

  • Lawrence, S.; Pennock, D. M.; Flake, G. W.
  • Computer, Vol. 34, Issue 3
  • DOI: 10.1109/2.901164

The web changes everything: understanding the dynamics of web content
conference, January 2009

  • Adar, Eytan; Teevan, Jaime; Dumais, Susan T.
  • Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09
  • DOI: 10.1145/1498759.1498837

Zoetrope: interacting with the ephemeral web
conference, January 2008

  • Adar, Eytan; Dontcheva, Mira; Fogarty, James
  • Proceedings of the 21st annual ACM symposium on User interface software and technology - UIST '08
  • DOI: 10.1145/1449715.1449756

A cross disciplinary study of link decay and the effectiveness of mitigation techniques
journal, October 2013


Keeping up with the changing Web
journal, May 2000

  • Brewington, B. E.; Cybenko, G.
  • Computer, Vol. 33, Issue 5
  • DOI: 10.1109/2.841784

The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication
journal, March 2007

  • Aronsky, D.; Madani, S.; Carnevale, R. J.
  • Journal of the American Medical Informatics Association, Vol. 14, Issue 2
  • DOI: 10.1197/jamia.M2243

Revisiting Lexical Signatures to (Re-)Discover Web Pages
book, January 2008


A large-scale study of the evolution of web pages
conference, January 2003

  • Fetterly, Dennis; Manasse, Mark; Najork, Marc
  • Proceedings of the twelfth international conference on World Wide Web - WWW '03
  • DOI: 10.1145/775152.775246

Librarians and Link Rot: A Comparative Analysis with Some Methodological Considerations
journal, January 2003

  • Tyler, David C.; McNeil, Beth
  • portal: Libraries and the Academy, Vol. 3, Issue 4
  • DOI: 10.1353/pla.2003.0098

Towards Robust Hyperlinks for Web-Based Scholarly Communication
book, January 2014


The half-life of internet references cited in communication journals
journal, October 2007


Profiling web archive coverage for top-level domain and content language
journal, June 2014

  • AlSum, Ahmed; Weigle, Michele C.; Nelson, Michael L.
  • International Journal on Digital Libraries, Vol. 14, Issue 3-4
  • DOI: 10.1007/s00799-014-0118-y

URL decay in MEDLINE--a 4-year follow-up study
journal, April 2008


INFORMATION SCIENCE: Going, Going, Gone: Lost Internet References
journal, October 2003


The decay and failures of web references
journal, January 2003


HTTP Framework for Time-Based Access to Resource States -- Memento
report, December 2013


Extraction and analysis of referenced web links in large-scale scholarly articles
conference, September 2014

  • Zhou, Ke; Tobin, Richard; Grover, Claire
  • 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
  • DOI: 10.1109/JCDL.2014.6970220

Special-Use Domain Names
report, February 2013


Research Objects: Towards Exchange and Reuse of Digital Knowledge
journal, July 2010


A large-scale study of the evolution of Web pages
journal, January 2004

  • Fetterly, Dennis; Manasse, Mark; Najork, Marc
  • Software: Practice and Experience, Vol. 34, Issue 2
  • DOI: 10.1002/spe.577

arXiv articles summary files
dataset, January 2014


arXiv Memento data
dataset, January 2014


Elsevier Memento data
dataset, January 2014


Part 2: Elsevier articles summary files
dataset, January 2014


Profiling Web Archive Coverage for Top-Level Domain and Content Language
book, January 2013

  • Alsum, Ahmed; Weigle, Michele C.; Nelson, Michael L.
  • Research and Advanced Technology for Digital Libraries
  • DOI: 10.1007/978-3-642-40501-3_7

arXiv articles summary files
dataset, January 2014


arXiv Memento data
dataset, January 2014


PMC Memento data
dataset, January 2014


PMC articles summary files
dataset, January 2014


Live web test data
dataset, January 2014


Research Objects: Towards Exchange and Reuse of Digital Knowledge
journal, July 2010


Accessibility of online resources cited in scholarly LIS journals: A study of Emerald ISI‐ranked journals
journal, March 2012

  • Sadat‐Moosavi, Ali; Isfandyari‐Moghaddam, Alireza; Tajeddini, Oranus
  • Aslib Proceedings, Vol. 64, Issue 2
  • DOI: 10.1108/00012531211215196

PMC Memento data
dataset, January 2014


Part 2: Elsevier articles summary files
dataset, January 2014


Part 1: Elsevier articles summary files
dataset, January 2014


Works referencing / citing this record:

Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature
journal, May 2015

  • Howison, James; Bullard, Julia
  • Journal of the Association for Information Science and Technology, Vol. 67, Issue 9
  • DOI: 10.1002/asi.23538

The Dat Project, an open and decentralized research data tool
journal, October 2018

  • Robinson, Danielle C.; Hand, Joe A.; Madsen, Mathias Buus
  • Scientific Data, Vol. 5, Issue 1
  • DOI: 10.1038/sdata.2018.221

An open source web application for distributed geospatial data exploration
journal, February 2019


New Forms of Scholarship and a Serials (R)evolution
journal, July 2015


Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges
journal, June 2018

  • Mannheimer, Sara; Pienta, Amy; Kirilova, Dessislava
  • American Behavioral Scientist, Vol. 63, Issue 5
  • DOI: 10.1177/0002764218784991

Bioboxes: standardised containers for interchangeable bioinformatics software
journal, October 2015


Identifying PIDs playing FAIR
journal, November 2019


“As-You-Go” Instead of “After-the-Fact”: A Network Approach to Scholarly Communication and Evaluation
journal, April 2018


Cool DOI's
text, January 2016


Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges
posted_content, January 2018

  • Mannheimer, Sara; Pienta, Amy; Kirilova, Dessi
  • American Behavioral Scientist
  • DOI: 10.31235/osf.io/7z2bt

"As-you-go" instead of "after-the-fact": A network approach to scholarly communication and evaluation
posted_content, January 2018


The Dat Project, an open and decentralized research data tool
journal, October 2018

  • Robinson, Danielle C.; Hand, Joe A.; Madsen, Mathias Buus
  • Scientific Data, Vol. 5, Issue 1
  • DOI: 10.1038/sdata.2018.221

An open source web application for distributed geospatial data exploration
journal, February 2019


Bioboxes: standardised containers for interchangeable bioinformatics software
journal, October 2015


The Cochrane Collaboration: institutional analysis of a knowledge commons
journal, February 2018


Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
journal, December 2016