Scholarly context not found: One in five articles suffers from reference rot
Abstract
The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growingmore »
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- The Univ. of Edinburgh, Scotland (United Kingdom)
- Bar-llan Univ., (Israel)
- Publication Date:
- Research Org.:
- Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
- Sponsoring Org.:
- USDOE
- OSTI Identifier:
- 1201464
- Resource Type:
- Accepted Manuscript
- Journal Name:
- PLoS ONE
- Additional Journal Information:
- Journal Volume: 9; Journal Issue: 12; Journal ID: ISSN 1932-6203
- Publisher:
- Public Library of Science
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Information Retrieval
Citation Formats
Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, and Bar-Ilan, Judit. Scholarly context not found: One in five articles suffers from reference rot. United States: N. p., 2014.
Web. doi:10.1371/journal.pone.0115253.
Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, & Bar-Ilan, Judit. Scholarly context not found: One in five articles suffers from reference rot. United States. https://doi.org/10.1371/journal.pone.0115253
Klein, Martin, Van de Sompel, Herbert, Sanderson, Robert, Shankar, Harihar, Balakireva, Lyudmila, Zhou, Ke, Tobin, Richard, and Bar-Ilan, Judit. Fri .
"Scholarly context not found: One in five articles suffers from reference rot". United States. https://doi.org/10.1371/journal.pone.0115253. https://www.osti.gov/servlets/purl/1201464.
@article{osti_1201464,
title = {Scholarly context not found: One in five articles suffers from reference rot},
author = {Klein, Martin and Van de Sompel, Herbert and Sanderson, Robert and Shankar, Harihar and Balakireva, Lyudmila and Zhou, Ke and Tobin, Richard and Bar-Ilan, Judit},
abstractNote = {The emergence of the web has fundamentally affected most aspects of information communication, including scholarly communication. The immediacy that characterizes publishing information to the web, as well as accessing it, allows for a dramatic increase in the speed of dissemination of scholarly knowledge. But, the transition from a paper-based to a web-based scholarly communication system also poses challenges. In this paper, we focus on reference rot, the combination of link rot and content drift to which references to web resources included in Science, Technology, and Medicine (STM) articles are subject. We investigate the extent to which reference rot impacts the ability to revisit the web context that surrounds STM articles some time after their publication. We do so on the basis of a vast collection of articles from three corpora that span publication years 1997 to 2012. For over one million references to web resources extracted from over 3.5 million articles, we determine whether the HTTP URI is still responsive on the live web and whether web archives contain an archived snapshot representative of the state the referenced resource had at the time it was referenced. We observe that the fraction of articles containing references to web resources is growing steadily over time. We find one out of five STM articles suffering from reference rot, meaning it is impossible to revisit the web context that surrounds them some time after their publication. When only considering STM articles that contain references to web resources, this fraction increases to seven out of ten.},
doi = {10.1371/journal.pone.0115253},
journal = {PLoS ONE},
number = 12,
volume = 9,
place = {United States},
year = {Fri Dec 26 00:00:00 EST 2014},
month = {Fri Dec 26 00:00:00 EST 2014}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Accessibility of online resources cited in scholarly LIS journals: A study of Emerald ISI‐ranked journals
journal, March 2012
- Sadat‐Moosavi, Ali; Isfandyari‐Moghaddam, Alireza; Tajeddini, Oranus
- Aslib Proceedings, Vol. 64, Issue 2
404 not found: the stability and persistence of URLs published in MEDLINE
journal, January 2004
- Wren, J. D.
- Bioinformatics, Vol. 20, Issue 5
Moved but not gone: an evaluation of real-time methods for discovering replacement web pages
journal, February 2014
- Klein, Martin; Nelson, Michael L.
- International Journal on Digital Libraries, Vol. 14, Issue 1-2
Disappearing act: decay of uniform resource locators in health care management journals
journal, April 2009
- Wagner, Cassie; Gebremichael, Meseret D.; Taylor, Mary K.
- Journal of the Medical Library Association : JMLA, Vol. 97, Issue 2
Web page change and persistence?A four-year longitudinal study
journal, January 2002
- Koehler, Wallace
- Journal of the American Society for Information Science and Technology, Vol. 53, Issue 2
Ecology in the information age: patterns of use and attrition rates of internet-based citations in ESA journals, 1997–2005
journal, April 2008
- Duda, Jeffrey J.; Camp, Richard J.
- Frontiers in Ecology and the Environment, Vol. 6, Issue 3
Persistence of Web references in scientific research
journal, March 2001
- Lawrence, S.; Pennock, D. M.; Flake, G. W.
- Computer, Vol. 34, Issue 3
The web changes everything: understanding the dynamics of web content
conference, January 2009
- Adar, Eytan; Teevan, Jaime; Dumais, Susan T.
- Proceedings of the Second ACM International Conference on Web Search and Data Mining - WSDM '09
Zoetrope: interacting with the ephemeral web
conference, January 2008
- Adar, Eytan; Dontcheva, Mira; Fogarty, James
- Proceedings of the 21st annual ACM symposium on User interface software and technology - UIST '08
A cross disciplinary study of link decay and the effectiveness of mitigation techniques
journal, October 2013
- Hennessey, Jason; Ge, Steven Xijin
- BMC Bioinformatics, Vol. 14, Issue S14
Keeping up with the changing Web
journal, May 2000
- Brewington, B. E.; Cybenko, G.
- Computer, Vol. 33, Issue 5
The Prevalence and Inaccessibility of Internet References in the Biomedical Literature at the Time of Publication
journal, March 2007
- Aronsky, D.; Madani, S.; Carnevale, R. J.
- Journal of the American Medical Informatics Association, Vol. 14, Issue 2
Revisiting Lexical Signatures to (Re-)Discover Web Pages
book, January 2008
- Klein, Martin; Nelson, Michael L.
- Research and Advanced Technology for Digital Libraries
A large-scale study of the evolution of web pages
conference, January 2003
- Fetterly, Dennis; Manasse, Mark; Najork, Marc
- Proceedings of the twelfth international conference on World Wide Web - WWW '03
Librarians and Link Rot: A Comparative Analysis with Some Methodological Considerations
journal, January 2003
- Tyler, David C.; McNeil, Beth
- portal: Libraries and the Academy, Vol. 3, Issue 4
Towards Robust Hyperlinks for Web-Based Scholarly Communication
book, January 2014
- Van de Sompel, Herbert; Klein, Martin; Shankar, Harihar
- Lecture Notes in Computer Science
The half-life of internet references cited in communication journals
journal, October 2007
- Dimitrova, Daniela V.; Bugeja, Michael
- New Media & Society, Vol. 9, Issue 5
Profiling web archive coverage for top-level domain and content language
journal, June 2014
- AlSum, Ahmed; Weigle, Michele C.; Nelson, Michael L.
- International Journal on Digital Libraries, Vol. 14, Issue 3-4
URL decay in MEDLINE--a 4-year follow-up study
journal, April 2008
- Wren, J. D.
- Bioinformatics, Vol. 24, Issue 11
INFORMATION SCIENCE: Going, Going, Gone: Lost Internet References
journal, October 2003
- Dellavalle, R. P.
- Science, Vol. 302, Issue 5646
The decay and failures of web references
journal, January 2003
- Spinellis, Diomidis
- Communications of the ACM, Vol. 46, Issue 1
HTTP Framework for Time-Based Access to Resource States -- Memento
report, December 2013
- Van de Sompel, H.; Nelson, M.; Sanderson, R.
Extraction and analysis of referenced web links in large-scale scholarly articles
conference, September 2014
- Zhou, Ke; Tobin, Richard; Grover, Claire
- 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
Research Objects: Towards Exchange and Reuse of Digital Knowledge
journal, July 2010
- Bechhofer, Sean; De Roure, David; Gamble, Matthew
- Nature Precedings
A large-scale study of the evolution of Web pages
journal, January 2004
- Fetterly, Dennis; Manasse, Mark; Najork, Marc
- Software: Practice and Experience, Vol. 34, Issue 2
Profiling Web Archive Coverage for Top-Level Domain and Content Language
book, January 2013
- Alsum, Ahmed; Weigle, Michele C.; Nelson, Michael L.
- Research and Advanced Technology for Digital Libraries
Research Objects: Towards Exchange and Reuse of Digital Knowledge
journal, July 2010
- Bechhofer, Sean; De Roure, David; Gamble, Matthew
- Nature Precedings
Accessibility of online resources cited in scholarly LIS journals: A study of Emerald ISI‐ranked journals
journal, March 2012
- Sadat‐Moosavi, Ali; Isfandyari‐Moghaddam, Alireza; Tajeddini, Oranus
- Aslib Proceedings, Vol. 64, Issue 2
Works referencing / citing this record:
Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature
journal, May 2015
- Howison, James; Bullard, Julia
- Journal of the Association for Information Science and Technology, Vol. 67, Issue 9
The Dat Project, an open and decentralized research data tool
journal, October 2018
- Robinson, Danielle C.; Hand, Joe A.; Madsen, Mathias Buus
- Scientific Data, Vol. 5, Issue 1
An open source web application for distributed geospatial data exploration
journal, February 2019
- Curry, Patrick A.; Moosdorf, Nils
- Scientific Data, Vol. 6, Issue 1
New Forms of Scholarship and a Serials (R)evolution
journal, July 2015
- Borie, Juliya
- Serials Review, Vol. 41, Issue 3
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges
journal, June 2018
- Mannheimer, Sara; Pienta, Amy; Kirilova, Dessislava
- American Behavioral Scientist, Vol. 63, Issue 5
Bioboxes: standardised containers for interchangeable bioinformatics software
journal, October 2015
- Belmann, Peter; Dröge, Johannes; Bremges, Andreas
- GigaScience, Vol. 4, Issue 1
Identifying PIDs playing FAIR
journal, November 2019
- Philipson, Joakim
- Data Science, Vol. 2, Issue 1-2
“As-You-Go” Instead of “After-the-Fact”: A Network Approach to Scholarly Communication and Evaluation
journal, April 2018
- Hartgerink, Chris; van Zelst, Marino
- Publications, Vol. 6, Issue 2
Verified, Shared, Modular, and Provenance Based Research Communication with the Dat Protocol
journal, June 2019
- Hartgerink, Chris
- Publications, Vol. 7, Issue 2
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges
posted_content, January 2018
- Mannheimer, Sara; Pienta, Amy; Kirilova, Dessi
- American Behavioral Scientist
"As-you-go" instead of "after-the-fact": A network approach to scholarly communication and evaluation
posted_content, January 2018
- Hartgerink, Chris H. J.; van Zelst, Marino
- Peer J
"As-you-go" instead of "after-the-fact": A network approach to scholarly communication and evaluation
posted_content, March 2018
- Hartgerink, Chris C. J.; van Zelst, Marino
- Peer J
Qualitative Data Sharing: Data Repositories and Academic Libraries as Key Partners in Addressing Challenges
text, January 2017
- Mannheimer, Sara; Pienta, Amy; Kirilova, Dessi
- SocArXiv
The Dat Project, an open and decentralized research data tool
journal, October 2018
- Robinson, Danielle C.; Hand, Joe A.; Madsen, Mathias Buus
- Scientific Data, Vol. 5, Issue 1
An open source web application for distributed geospatial data exploration
journal, February 2019
- Curry, Patrick A.; Moosdorf, Nils
- Scientific Data, Vol. 6, Issue 1
Bioboxes: standardised containers for interchangeable bioinformatics software
journal, October 2015
- Belmann, Peter; Dröge, Johannes; Bremges, Andreas
- GigaScience, Vol. 4, Issue 1
The Cochrane Collaboration: institutional analysis of a knowledge commons
journal, February 2018
- Heywood, Peter; Stephani, Anne Marie; Garner, Paul
- Evidence & Policy, Vol. 14, Issue 1
Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
journal, June 2017
- McMurry, Julie A.; Juty, Nick; Blomberg, Niklas
- PLOS Biology, Vol. 15, Issue 6
Scholarly Context Adrift: Three out of Four URI References Lead to Changed Content
journal, December 2016
- Jones, Shawn M.; Van de Sompel, Herbert; Shankar, Harihar
- PLOS ONE, Vol. 11, Issue 12
Figures / Tables found in this record: