An ensemble model of QSAR tools for regulatory risk assessment

Pradeep, Prachi; Povinelli, Richard J.; White, Shannon; Merrill, Stephen J.

doi:10.1186/s13321-016-0164-0

Title: An ensemble model of QSAR tools for regulatory risk assessment

Journal Article · Thu Sep 22 00:00:00 EDT 2016 · Journal of Cheminformatics

DOI:https://doi.org/10.1186/s13321-016-0164-0· OSTI ID:1375955

^[1]; Povinelli, Richard J. ^[2]; White, Shannon ^[3]; Merrill, Stephen J. ^[2]

National Center for Computational Toxicology (ORISE Fellow), Research Triangle Park, NC (United States)
Marquette Univ., Milwaukee, WI (United States)
Georgetown Univ. Medical Center, Washington, D.C. (United States)

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Oak Ridge Institute for Science and Education (ORISE), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

OSTI ID:: 1375955

Journal Information:: Journal of Cheminformatics, Vol. 8, Issue 1; ISSN 1758-2946

Publisher:: Chemistry Central Ltd.Copyright Statement

Country of Publication:: United States

Language:: English

Citation Metrics:

Cited by: 29 works

Citation information provided by
Web of Science

References (34)

Computer‐assisted analysis of interlaboratory Ames test variability Benigni, R.; Giuliani, A. Journal of Toxicology and Environmental Health, Vol. 25, Issue 1 https://doi.org/10.1080/15287398809531194	journal	January 1988
The Challenges Involved in Modeling Toxicity Data In Silico: A Review Gleeson, M. Paul; Modi, Sandeep; Bender, Andreas Current Pharmaceutical Design, Vol. 18, Issue 9 https://doi.org/10.2174/138161212799436359	journal	March 2012
Classifier ensembles: Select real-world applications Oza, Nikunj C.; Tumer, Kagan Information Fusion, Vol. 9, Issue 1 https://doi.org/10.1016/j.inffus.2007.07.002	journal	January 2008
Combined Use of MC4PC, MDL-QSAR, BioEpisteme, Leadscope PDM, and Derek for Windows Software to Achieve High-Performance, High-Confidence, Mode of Action–Based Predictions of Chemical Carcinogenesis in Rodents Matthews, Edwin J.; Kruhlak, Naomi L.; Benz, R. Daniel Toxicology Mechanisms and Methods, Vol. 18, Issue 2-3 https://doi.org/10.1080/15376510701857379	journal	January 2008
Interpretable, Probability-Based Confidence Metric for Continuous Quantitative Structure–Activity Relationship Models Keefer, Christopher E.; Kauffman, Gregory W.; Gupta, Rishi Raj Journal of Chemical Information and Modeling, Vol. 53, Issue 2 https://doi.org/10.1021/ci300554t	journal	February 2013
Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation H�fer, Thomas; Gerner, Ingrid; Gundert-Remy, Ursula Archives of Toxicology, Vol. 78, Issue 10 https://doi.org/10.1007/s00204-004-0577-9	journal	May 2004
Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Cronin, Mark T. D.; Jaworska, Joanna S.; Walker, John D. Environmental Health Perspectives, Vol. 111, Issue 10 https://doi.org/10.1289/ehp.5760	journal	August 2003
Evaluation of model predictive ability by external validation techniques Consonni, Viviana; Ballabio, Davide; Todeschini, Roberto Journal of Chemometrics, Vol. 24, Issue 3-4 https://doi.org/10.1002/cem.1290	journal	February 2010
Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling Svetnik, Vladimir; Wang, Ting; Tong, Christopher Journal of Chemical Information and Modeling, Vol. 45, Issue 3 https://doi.org/10.1021/ci0500379	journal	May 2005
A Coefficient of Agreement for Nominal Scales Cohen, Jacob Educational and Psychological Measurement, Vol. 20, Issue 1 https://doi.org/10.1177/001316446002000104	journal	April 1960
Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models Zou, Kelly H.; O’Malley, A. James; Mauri, Laura Circulation, Vol. 115, Issue 5 https://doi.org/10.1161/CIRCULATIONAHA.105.594929	journal	February 2007
Toxicokinetics as a key to the integrated toxicity risk assessment based primarily on non-animal approaches Coecke, Sandra; Pelkonen, Olavi; Leite, Sofia Batista Toxicology in Vitro, Vol. 27, Issue 5 https://doi.org/10.1016/j.tiv.2012.06.012	journal	August 2013
U.S. EPA Regulatory Perspectives on the Use of QSAR for New and Existing Chemical Evaluations Zeeman, M.; Auer, C. M.; Clements, R. G. SAR and QSAR in Environmental Research, Vol. 3, Issue 3 https://doi.org/10.1080/10629369508234003	journal	September 1995
Ensemble QSAR: A QSAR method based on conformational ensembles and metric descriptors Pissurlenkar, Raghuvir R. S.; Khedkar, Vijay M.; Iyer, Radhakrishnan P. Journal of Computational Chemistry, Vol. 32, Issue 10 https://doi.org/10.1002/jcc.21804	journal	April 2011
The application of discovery toxicology and pathology towards the design of safer pharmaceutical lead candidates Kramer, Jeffrey A.; Sagartz, John E.; Morris, Dale L. Nature Reviews Drug Discovery, Vol. 6, Issue 8 https://doi.org/10.1038/nrd2378	journal	August 2007
In silico toxicology models and databases as FDA Critical Path Initiative toolkits Valerio, Luis G. Human Genomics, Vol. 5, Issue 3 https://doi.org/10.1186/1479-7364-5-3-200	journal	January 2011
Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Jaworska, Joanna S.; Comber, M.; Auer, C. Environmental Health Perspectives, Vol. 111, Issue 10 https://doi.org/10.1289/ehp.5757	journal	August 2003
Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System Sanderson, D. M.; Earnshaw, C. G. Human & Experimental Toxicology, Vol. 10, Issue 4 https://doi.org/10.1177/096032719101000405	journal	July 1991
Comparative Evaluation of in Silico Systems for Ames Test Mutagenicity Prediction: Scope and Limitations Hillebrecht, Alexander; Muster, Wolfgang; Brigo, Alessandro Chemical Research in Toxicology, Vol. 24, Issue 6 https://doi.org/10.1021/tx2000398	journal	June 2011
Comparison of MC4PC and MDL-QSAR rodent carcinogenicity predictions and the enhancement of predictive performance by combining QSAR models Contrera, Joseph F.; Kruhlak, Naomi L.; Matthews, Edwin J. Regulatory Toxicology and Pharmacology, Vol. 49, Issue 3 https://doi.org/10.1016/j.yrtph.2007.07.001	journal	December 2007
Integration of QSAR models for bioconcentration suitable for REACH Gissi, Andrea; Nicolotti, Orazio; Carotti, Angelo Science of The Total Environment, Vol. 456-457 https://doi.org/10.1016/j.scitotenv.2013.03.104	journal	July 2013
Computational toxicology in drug development Muster, Wolfgang; Breidenbach, Alexander; Fischer, Holger Drug Discovery Today, Vol. 13, Issue 7-8 https://doi.org/10.1016/j.drudis.2007.12.007	journal	April 2008
A new hybrid system of QSAR models for predicting bioconcentration factors (BCF) Zhao, Chunyan; Boriani, Elena; Chana, Antonio Chemosphere, Vol. 73, Issue 11 https://doi.org/10.1016/j.chemosphere.2008.09.033	journal	December 2008
Real External Predictivity of QSAR Models: How To Evaluate It? Comparison of Different Validation Criteria and Proposal of Using the Concordance Correlation Coefficient Chirico, Nicola; Gramatica, Paola Journal of Chemical Information and Modeling, Vol. 51, Issue 9 https://doi.org/10.1021/ci200211n	journal	August 2011
A weighted voting framework for classifiers ensembles Kuncheva, Ludmila I.; Rodríguez, Juan J. Knowledge and Information Systems, Vol. 38, Issue 2 https://doi.org/10.1007/s10115-012-0586-6	journal	December 2012
Methods for detecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenicity test Ames, Bruce N.; McCann, Joyce; Yamasaki, Edith Mutation Research/Environmental Mutagenesis and Related Subjects, Vol. 31, Issue 6 https://doi.org/10.1016/0165-1161(75)90046-1	journal	December 1975
LeadScope ^† : Software for Exploring Large Sets of Screening Data Roberts, Gulsevin; Myatt, Glenn J.; Johnson, Wayne P. Journal of Chemical Information and Computer Sciences, Vol. 40, Issue 6 https://doi.org/10.1021/ci0000631	journal	November 2000
BioEpisteme®—An in silico approach for predicting and understanding the underlying molecular mechanisms contributing to toxicity responses Valencia, A. Toxicology Letters, Vol. 196 https://doi.org/10.1016/j.toxlet.2010.03.117	journal	July 2010
The Role of Qsar Methodology in the Regulatory Assessment of Chemicals Worth, Andrew Paul Recent Advances in QSAR Studies. Challenges and Advances in Computational Chemistry and Physics https://doi.org/10.1007/978-1-4020-9783-6_13	book	October 2009
The Challenges Involved in Modeling Toxicity Data In Silico: A Review Gleeson, M. Paul; Modi, Sandeep; Bender, Andreas Current Drug Metabolism, Vol. 18, Issue 9 https://doi.org/10.2174/138920012799362819	journal	March 2012
QSAR analysis of metal ion toxicity data in sunflower callus cultures (Helianthus annuus ?Sunspot?) Enache, Monica; Dearden, John?C.; Walker, John?D. QSAR & Combinatorial Science, Vol. 22, Issue 2 https://doi.org/10.1002/qsar.200390017	journal	April 2003
Medical Device Development: From Prototype to Regulatory Approval Kaplan, Aaron V.; Baim, Donald S.; Smith, John J. Circulation, Vol. 109, Issue 25 https://doi.org/10.1161/01.cir.0000134695.65733.64	journal	June 2004
QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review Jaworska, Joanna; Nikolova-Jeliazkova, Nina; Aldenberg, Tom Alternatives to Laboratory Animals, Vol. 33, Issue 5 https://doi.org/10.1177/026119290503300508	journal	October 2005
Proposal and multicentric validation of a laparoscopic Roux-en-Y gastric bypass surgery ontology. Lavanchy, Joël Lukas; Gonzalez, Cristians; Kassem, Hasan Springer-Verlag https://doi.org/10.48350/174227	text	January 2022

Cited By (5)

Mixtures of QSAR models: Learning application domains of p K predicto rs Dörgő, Gyula; Péter Hamadi, Omar; Varga, Tamás Journal of Chemometrics, Vol. 34, Issue 4 https://doi.org/10.1002/cem.3223	journal	April 2020
QSAR classification and regression models for β-secretase inhibitors using relative distance matrices Luque Ruiz, I.; Gómez-Nieto, M. Á. SAR and QSAR in Environmental Research, Vol. 29, Issue 5 https://doi.org/10.1080/1062936x.2018.1442879	journal	March 2018
QSAR/QSPR models based on quantum chemistry for risk assessment of pesticides according to current European legislation Villaverde, J. J.; Sevilla-Morán, B.; López-Goti, C. SAR and QSAR in Environmental Research, Vol. 31, Issue 1 https://doi.org/10.1080/1062936x.2019.1692368	journal	November 2019
Decision tree models to classify nanomaterials according to the DF4nanoGrouping scheme Gajewicz, Agnieszka; Puzyn, Tomasz; Odziomek, Katarzyna Nanotoxicology, Vol. 12, Issue 1 https://doi.org/10.1080/17435390.2017.1415388	journal	December 2017
Comprehensive ensemble in QSAR prediction for drug discovery Kwon, Sunyoung; Bae, Ho; Jo, Jeonghee BMC Bioinformatics, Vol. 20, Issue 1 https://doi.org/10.1186/s12859-019-3135-4	journal	October 2019

Linked Research (6)

Figures / Tables (5)

Similar Records

Prediction of rodent carcinogenic potential of naturally occurring chemicals in the human diet using high-throughput QSAR predictive modeling

Journal Article · Sun Jul 01 00:00:00 EDT 2007 · Toxicology and Applied Pharmacology · OSTI ID:1375955

Valerio, Luis G; Arvidson, Kirk B; Chanderbhan, Ronald F; +1 more

A novel QSAR model of Salmonella mutagenicity and its application in the safety assessment of drug impurities

Journal Article · Sun Dec 15 00:00:00 EST 2013 · Toxicology and Applied Pharmacology · OSTI ID:1375955

Valencia, Antoni; Prous, Josep; Mora, Oscar; +1 more

Identification of putative estrogen receptor-mediated endocrine disrupting chemicals using QSAR- and structure-based virtual screening approaches

Journal Article · Tue Oct 01 00:00:00 EDT 2013 · Toxicology and Applied Pharmacology · OSTI ID:1375955

Zhang, Liying; Sedykh, Alexander; Tripathi, Ashutosh; +6 more

Related Subjects

37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY
97 MATHEMATICS AND COMPUTING
Computational toxicology
In silico QSAR tools
Hybrid QSAR models
Ensemble models
Risk assessment

An ensemble model of QSAR tools for regulatory risk assessment [Supplemental Data] Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344 The current record is supplemented by this dataset	dataset	September 2016
An ensemble model of QSAR tools for regulatory risk assessment [Supplemental Data] Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344.v1 The current record is supplemented by this dataset	dataset	September 2016
MOESM2 of An ensemble model of QSAR tools for regulatory risk assessment Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344_d1.v1 The current record is supplemented by this dataset	dataset	September 2016
MOESM1 of An ensemble model of QSAR tools for regulatory risk assessment Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344_d2.v1 The current record is supplemented by this dataset	dataset	September 2016
MOESM2 of An ensemble model of QSAR tools for regulatory risk assessment Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344_d1 The current record is supplemented by this text	text	January 2016
MOESM1 of An ensemble model of QSAR tools for regulatory risk assessment Pradeep, Prachi; Povinelli, Richard; White, Shannon https://doi.org/10.6084/m9.figshare.c.3698344_d2 The current record is supplemented by this text	text	January 2016