skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An ensemble model of QSAR tools for regulatory risk assessment

Journal Article · · Journal of Cheminformatics
ORCiD logo [1];  [2];  [3];  [2]
  1. National Center for Computational Toxicology (ORISE Fellow), Research Triangle Park, NC (United States)
  2. Marquette Univ., Milwaukee, WI (United States)
  3. Georgetown Univ. Medical Center, Washington, D.C. (United States)

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.

Research Organization:
Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
OSTI ID:
1375955
Journal Information:
Journal of Cheminformatics, Vol. 8, Issue 1; ISSN 1758-2946
Publisher:
Chemistry Central Ltd.Copyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 29 works
Citation information provided by
Web of Science

References (34)

Computer‐assisted analysis of interlaboratory Ames test variability journal January 1988
The Challenges Involved in Modeling Toxicity Data In Silico: A Review journal March 2012
Classifier ensembles: Select real-world applications journal January 2008
Combined Use of MC4PC, MDL-QSAR, BioEpisteme, Leadscope PDM, and Derek for Windows Software to Achieve High-Performance, High-Confidence, Mode of Action–Based Predictions of Chemical Carcinogenesis in Rodents journal January 2008
Interpretable, Probability-Based Confidence Metric for Continuous Quantitative Structure–Activity Relationship Models journal February 2013
Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation journal May 2004
Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. journal August 2003
Evaluation of model predictive ability by external validation techniques journal February 2010
Boosting:  An Ensemble Learning Tool for Compound Classification and QSAR Modeling journal May 2005
A Coefficient of Agreement for Nominal Scales journal April 1960
Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models journal February 2007
Toxicokinetics as a key to the integrated toxicity risk assessment based primarily on non-animal approaches journal August 2013
U.S. EPA Regulatory Perspectives on the Use of QSAR for New and Existing Chemical Evaluations journal September 1995
Ensemble QSAR: A QSAR method based on conformational ensembles and metric descriptors journal April 2011
The application of discovery toxicology and pathology towards the design of safer pharmaceutical lead candidates journal August 2007
In silico toxicology models and databases as FDA Critical Path Initiative toolkits journal January 2011
Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. journal August 2003
Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System journal July 1991
Comparative Evaluation of in Silico Systems for Ames Test Mutagenicity Prediction: Scope and Limitations journal June 2011
Comparison of MC4PC and MDL-QSAR rodent carcinogenicity predictions and the enhancement of predictive performance by combining QSAR models journal December 2007
Integration of QSAR models for bioconcentration suitable for REACH journal July 2013
Computational toxicology in drug development journal April 2008
A new hybrid system of QSAR models for predicting bioconcentration factors (BCF) journal December 2008
Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient journal August 2011
A weighted voting framework for classifiers ensembles journal December 2012
Methods for detecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenicity test journal December 1975
LeadScope :  Software for Exploring Large Sets of Screening Data journal November 2000
BioEpisteme®—An in silico approach for predicting and understanding the underlying molecular mechanisms contributing to toxicity responses journal July 2010
The Role of Qsar Methodology in the Regulatory Assessment of Chemicals book October 2009
The Challenges Involved in Modeling Toxicity Data In Silico: A Review journal March 2012
QSAR analysis of metal ion toxicity data in sunflower callus cultures (Helianthus annuus ?Sunspot?) journal April 2003
Medical Device Development: From Prototype to Regulatory Approval journal June 2004
QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review journal October 2005
Proposal and multicentric validation of a laparoscopic Roux-en-Y gastric bypass surgery ontology. text January 2022

Cited By (5)

Mixtures of QSAR models: Learning application domains of p K predicto rs journal April 2020
QSAR classification and regression models for β-secretase inhibitors using relative distance matrices journal March 2018
QSAR/QSPR models based on quantum chemistry for risk assessment of pesticides according to current European legislation journal November 2019
Decision tree models to classify nanomaterials according to the DF4nanoGrouping scheme journal December 2017
Comprehensive ensemble in QSAR prediction for drug discovery journal October 2019


Figures / Tables (5)