DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Multi-Attribute Subset Selection enables prediction of representative phenotypes across microbial populations

Journal Article · · Communications Biology

The interpretation of complex biological datasets requires the identification of representative variables that describe the data without critical information loss. This is particularly important in the analysis of large phenotypic datasets (phenomics). Here we introduce Multi-Attribute Subset Selection (MASS), an algorithm which separates a matrix of phenotypes (e.g., yield across microbial species and environmental conditions) into predictor and response sets of conditions. Using mixed integer linear programming, MASS expresses the response conditions as a linear combination of the predictor conditions, while simultaneously searching for the optimally descriptive set of predictors. We apply the algorithm to three microbial datasets and identify environmental conditions that predict phenotypes under other conditions, providing biologically interpretable axes for strain discrimination. MASS could be used to reduce the number of experiments needed to identify species or to map their metabolic capabilities. The generality of the algorithm allows addressing subset selection problems in areas beyond biology.

Research Organization:
Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)
Sponsoring Organization:
Human Frontiers Science Program; National Institutes of Health (NIH); National Science Foundation (NSF); US Department of the Navy, Office of Naval Research (ONR); USDOE Office of Science (SC), Biological and Environmental Research (BER)
Grant/Contract Number:
AC02-05CH11231
OSTI ID:
2469647
Journal Information:
Communications Biology, Journal Name: Communications Biology Journal Issue: 1 Vol. 7; ISSN 2399-3642
Publisher:
Springer NatureCopyright Statement
Country of Publication:
United States
Language:
English

References (60)

PhenoSpace: A Shiny application to visualize trait data in the phenotypic space of the global spectrum of plant form and function journal January 2021
The Elements of Statistical Learning book January 2001
An Introduction to Statistical Learning book January 2021
Optimization and scale up of industrial fermentation processes journal July 2005
Optimal classification trees journal April 2017
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems journal February 2016
High-throughput phenomics: experimental methods for mapping fluxomes journal February 2004
Fungal metabolite analysis in genomics and phenomics journal April 2006
Microbial high throughput phenomics: The potential of an irreplaceable omics journal January 2020
Phenotypic screens as a renewed approach for drug discovery journal November 2013
Current challenges in genome annotation through structural biology and bioinformatics journal October 2012
Toward Metabolic Phenomics: Analysis of Genomic Data Using Flux Balances journal June 1999
Metabolic Footprinting of Mutant Libraries to Map Metabolite Utilization to Genotype journal October 2012
Random Forests journal January 2001
Microbial drug discovery: 80 years of progress journal January 2009
Precise, automated control of conditions for high-throughput growth of yeast and bacteria with eVOLVER journal June 2018
A tool named Iris for versatile high-throughput phenotyping in microorganisms journal February 2017
Phenotypic screening in cancer drug discovery — past, present and future journal July 2014
Phenomics: the next challenge journal November 2010
Genome content predicts the carbon catabolic preferences of heterotrophic bacteria journal August 2023
Machine learning for microbiologists journal November 2023
Mutant phenotypes for thousands of bacterial genes of unknown function journal May 2018
Statistical analysis in metabolic phenotyping journal July 2021
Machine learning phenomics (MLP) combining deep learning with time-lapse-microscopy for monitoring colorectal adenocarcinoma cells gene expression and drug-response journal May 2022
A comparative whole-genome approach identifies bacterial traits for marine microbial interactions journal March 2022
Untargeted metabolic footprinting reveals a surprising breadth of metabolite uptake and release by Synechococcus sp. PCC 7002 journal January 2011
Phenomics approaches to understand genetic networks and gene function in yeast journal March 2022
High-dimensional and large-scale phenotyping of yeast mutants journal December 2005
Metabolic dependencies drive species co-occurrence in diverse microbial communities journal May 2015
A Mixed Integer Linear Optimization Framework for the Identification and Quantification of Targeted Post-translational Modifications of Highly Modified Proteins Using Multiplexed Electron Transfer Dissociation Tandem Mass Spectrometry* journal November 2009
Identification of regulatory elements using a feature selection method journal September 2002
A review of feature selection techniques in bioinformatics journal August 2007
Identifying functional modules in protein-protein interaction networks: an integrated exact approach journal June 2008
Microbial genotype–phenotype mapping by class association rule mining journal May 2008
The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes journal September 2020
BacDive in 2022: the knowledge base for standardized bacterial and archaeal data journal October 2021
A Cross-Genomic Approach for Systematic Mapping of Phenotypic Traits to Genes journal December 2003
Random decision forests conference January 1995
Global phenotypic characterization of bacteria journal January 2009
PhenoChip: A single-cell phenomic platform for high-throughput photophysiological analyses of microalgae journal September 2020
Positive interactions are common among culturable bacteria journal November 2021
Machine Learning Reveals Missing Edges and Putative Interaction Mechanisms in Microbial Ecosystem Networks journal October 2018
Designing Metabolic Division of Labor in Microbial Communities journal April 2019
Metabolic Phenotyping of Marine Heterotrophs on Refactored Media Reveals Diverse Metabolic Adaptations and Lifestyle Strategies journal August 2022
Improved genome annotation through untargeted detection of pathway-specific metabolites journal January 2011
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation journal January 2020
PHENOS: a high-throughput and flexible tool for microorganism growth phenotyping on solid media journal January 2018
Addressing uncertainty in genome-scale metabolic model reconstruction and analysis journal February 2021
Best subset selection via a modern optimization lens journal April 2016
Identifying Protein Function—A Call for Community Action journal March 2004
Environmental Dependence of Genetic Constraint journal June 2013
Access to highly specialized growth substrates and production of epithelial immunomodulatory metabolites determine survival of Haemophilus influenzae in human airway epithelial cells journal January 2022
Selection of Subsets of Regression Variables journal January 1984
High-Pressure Microfluidics for Ultra-Fast Microbial Phenotyping journal May 2022
Missing Links Between Gene Function and Physiology in Genomics journal February 2022
Mathematical Programming in Computational Biology: an Annotated Bibliography journal November 2008
Machine Learning and Integrative Analysis of Biomedical Big Data journal January 2019
Genome-Scale Metabolic Modeling Enables In-Depth Understanding of Big Data journal December 2021
MOSS—Multi-Modal Best Subset Modeling in Smart Manufacturing journal January 2021
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation collection January 2020