skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: The NamesforLife Semantic Index of Phenotypic and Genotypic Data for Systems Biology

Technical Report ·
OSTI ID:1467104

Purpose of Research The research performed by NamesforLife, LLC and Michigan State University during the development of the Semantic Index of Genotypic and Phenotypic Data for Systems Biology addresses several key aspects of the “scientific reproducibility crisis” facing the field of microbiology and the scholarly publishing industry. Research Carried Out in this Project During the course of this project, a new method of Knowledge Organization was investigated for ontology and thesaurus construction, machine learning software was developed for Information Extraction (IE), and an extensive curatorial effort was undertaken to produce a lexicon of phenotypic terms that is backed by both an ontology and a thesaurus which support abstract query answering. Findings and Results Prior to this project, there was no electronic resource available that described the expressed traits and behavior (phenotypes) of all bacteria and archaea that was readily accessible, searchable, and could support direct comparisons across all taxa. For the first time, a source of normalized phenotypic descriptions is available for nearly all validly named species of prokaryotes. These descriptions facilitate comparison of these organisms and validate groupings (taxa) inferred from sequence-based analyses. This project has also resulted in the discovery of a new language-independent semantic equivalence method that addresses problems arising from induced meaning in technical communications. This has broad implications for uniting the disciplines of information extraction and knowledge inference. Potential Applications The knowledge representation and document annotation methods developed during this project are broadly applicable and of general interest to knowledge workers and service providers who work with Mixed Precision Information (MPI). These methods enable integration of raw data (e.g., sensor data) with interpreted information to support knowledge inference and query answering at multiple levels of abstraction. Software developed by NamesforLife, LLC provides a major improvement in the detection and annotation of complex terms appearing in scientific, technical and medical (STM) literature, supporting the peer review process as well as indexing and abstracting of STM literature. This software is built upon existing semantic web standards and recommendations, and the methods can be applied to many other fields to support development, refinement and validation of descriptive ontologies and terminologies. The online resources developed using methods developed under this award can impact microbiology research in various ways. For the first time, the entire taxonomy of the prokaryotes may be analyzed and validated according to the reported characteristics of all member organisms. This may result in the discovery of novel taxa having novel properties and/or the abandonment of some previously described taxa for which there is no supporting observational data. These resources also enable new services that will improve genome, metagenome and microbiome annotation and interpretation.

Research Organization:
NamesforLife, LLC, East Lansing, MI (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Contributing Organization:
Michigan State University
DOE Contract Number:
SC0006191
OSTI ID:
1467104
Type / Phase:
STTR (Phase IIB)
Report Number(s):
DOE-MSU-6191
Country of Publication:
United States
Language:
English