skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information
  1. Perspective on taxonomic classification of uncultivated viruses

    Historically, virus taxonomy has been limited to describing viruses that were readily cultivated in the laboratory or emerging in natural biomes. Metagenomic analyses, single-particle sequencing, and database mining efforts have yielded new sequence data on an astounding number of previously unknown viruses. As metagenomes are relatively free of biases, these data provide an unprecedented insight into the vastness of the virosphere, but to properly value the extent of this diversity it is critical that the viruses are taxonomically classified. Inclusion of uncultivated viruses has already improved the process as well as the understanding of the taxa, viruses, and their evolutionarymore » relationships. Here, we explain how the continuous development and testing of computational tools will be required to maintain a dynamic virus taxonomy that can accommodate the new discoveries.« less
  2. Philympics 2021: Prophage Predictions Perplex Programs

    Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Here, we present a library of gold-standard bacterial genome annotationsmore » that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f1 score from the analysis of seven different algorithms for the prediction of prophages. We discuss caveats and concerns in this analysis and how those concerns may be mitigated.« less
  3. Utilizing Amino Acid Composition and Entropy of Potential Open Reading Frames to Identify Protein-Coding Genes

    One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two; the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be createdmore » ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).« less
  4. NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index

    Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novelmore » functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.« less
  5. Modeling of the Coral Microbiome: the Influence of Temperature and Microbial Network

    Host-associated microbial communities are shaped by extrinsic and intrinsic factors to the holobiont organism. Environmental factors and microbe-microbe interactions act simultaneously on the microbial community structure, making the microbiome dynamics challenging to predict. The coral microbiome is essential to the health of coral reefs and sensitive to environmental changes. Here, we develop a dynamic model to determine the microbial community structure associated with the surface mucus layer (SML) of corals using temperature as an extrinsic factor and microbial network as an intrinsic factor. The model was validated by comparing the predicted relative abundances of microbial taxa to the relative abundancesmore » of microbial taxa from the sample data. The SML microbiome from Pseudodiploria strigosa was collected across reef zones in Bermuda, where inner and outer reefs are exposed to distinct thermal profiles. A shotgun metagenomics approach was used to describe the taxonomic composition and the microbial network of the coral SML microbiome. By simulating the annual temperature fluctuations at each reef zone, the model output is statistically identical to the observed data. The model was further applied to six scenarios that combined different profiles of temperature and microbial network to investigate the influence of each of these two factors on the model accuracy. The SML microbiome was best predicted by model scenarios with the temperature profile that was closest to the local thermal environment, regardless of the microbial network profile. Our model shows that the SML microbiome of P. strigosa in Bermuda is primarily structured by seasonal fluctuations in temperature at a reef scale, while the microbial network is a secondary driver. Coral microbiome dysbiosis (i.e., shifts in the microbial community structure or complete loss of microbial symbionts) caused by environmental changes is a key player in the decline of coral health worldwide. Multiple factors in the water column and the surrounding biological community influence the dynamics of the coral microbiome. However, by including only temperature as an external factor, our model proved to be successful in describing the microbial community associated with the surface mucus layer (SML) of the coral P. strigosa. The dynamic model developed and validated in this study is a potential tool to predict the coral microbiome under different temperature conditions.« less
  6. Allelic variation contributes to bacterial host specificity

    Understanding the molecular parameters that regulate cross-species transmission and host adaptation of potential pathogens is crucial to control emerging infectious disease. Although microbial pathotype diversity is conventionally associated with gene gain or loss, the role of pathoadaptive nonsynonymous single-nucleotide polymorphisms (nsSNPs) has not been systematically evaluated. Here, our genome-wide analysis of core genes within Salmonella enterica serovar Typhimurium genomes reveals a high degree of allelic variation in surface-exposed molecules, including adhesins that promote host colonization. Subsequent multinomial logistic regression, MultiPhen and Random Forest analyses of known/suspected adhesins from 580 independent Typhimurium isolates identifies distinct host-specific nsSNP signatures. Moreover, population andmore » functional analyses of host-associated nsSNPs for FimH, the type 1 fimbrial adhesin, highlights the role of key allelic residues in host-specific adherence in vitro. In conclusion, together, our data provide the first concrete evidence that functional differences between allelic variants of bacterial proteins likely contribute to pathoadaption to diverse hosts.« less

Search for:
All Records
Author / Contributor
0000000183838949

Refine by:
Resource Type
Availability
Author / Contributor
Research Organization