skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States

Abstract

Primary biodiversity data constitute observations of particular species at given points in time and space. Open-access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open-access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). In this study, we aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well-surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well-surveyed grid cellsmore » was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well-surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. Lastly, this comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.« less

Authors:
 [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Environmental Sciences Division
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Energy Efficiency and Renewable Energy (EERE)
OSTI Identifier:
1286862
Grant/Contract Number:
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Ecology and Evolution
Additional Journal Information:
Journal Volume: 6; Journal Issue: 14; Journal ID: ISSN 2045-7758
Publisher:
Wiley
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; 97 MATHEMATICS AND COMPUTING; Data; Database; Biodiversity; Fish; Open-Access

Citation Formats

Troia, Matthew J., and McManamay, Ryan A. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States. United States: N. p., 2016. Web. doi:10.1002/ece3.2225.
Troia, Matthew J., & McManamay, Ryan A. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States. United States. doi:10.1002/ece3.2225.
Troia, Matthew J., and McManamay, Ryan A. 2016. "Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States". United States. doi:10.1002/ece3.2225. https://www.osti.gov/servlets/purl/1286862.
@article{osti_1286862,
title = {Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States},
author = {Troia, Matthew J. and McManamay, Ryan A.},
abstractNote = {Primary biodiversity data constitute observations of particular species at given points in time and space. Open-access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open-access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records from the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). In this study, we aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well-surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well-surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well-surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. Lastly, this comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.},
doi = {10.1002/ece3.2225},
journal = {Ecology and Evolution},
number = 14,
volume = 6,
place = {United States},
year = 2016,
month = 6
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less
  • Open-access databases with utility in fisheries science have grown exponentially in quantity and scope over the past decade, with profound impacts to our discipline. The management, distillation, and sharing of an exponentially growing stream of open-access data represents several fundamental challenges in fisheries science. Many of the currently available open-access resources may not be universally known among fisheries scientists. We therefore introduce many national- and global-scale open-access databases with applications in fisheries science and provide an example of how they can be harnessed to perform valuable analyses without additional field efforts. We also discuss how the development, maintenance, and utilizationmore » of open-access data are likely to pose technical, financial, and educational challenges to fisheries scientists. Such cultural implications that will coincide with the rapidly increasing availability of free data should compel the American Fisheries Society to actively address these problems now to help ease the forthcoming cultural transition.« less
  • Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less
  • This is a presentation of the United Kingdom's experience with power transmission open access. The topics of the presentation include the objectives of changing, commercial arrangements and economic drivers, long term effects, the effects of moving to a more competitive environment, and factors affecting open access such as political climate and market regulation.