skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An ensemble model of QSAR tools for regulatory risk assessment

Abstract

Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity andmore » specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.« less

Authors:
ORCiD logo [1];  [2];  [3];  [2]
  1. National Center for Computational Toxicology (ORISE Fellow), Research Triangle Park, NC (United States)
  2. Marquette Univ., Milwaukee, WI (United States)
  3. Georgetown Univ. Medical Center, Washington, D.C. (United States)
Publication Date:
Research Org.:
Oak Ridge Inst. for Science and Education (ORISE), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1375955
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Cheminformatics
Additional Journal Information:
Journal Volume: 8; Journal Issue: 1; Journal ID: ISSN 1758-2946
Publisher:
Chemistry Central Ltd.
Country of Publication:
United States
Language:
English
Subject:
37 INORGANIC, ORGANIC, PHYSICAL, AND ANALYTICAL CHEMISTRY; 97 MATHEMATICS AND COMPUTING; Computational toxicology; In silico QSAR tools; Hybrid QSAR models; Ensemble models; Risk assessment

Citation Formats

Pradeep, Prachi, Povinelli, Richard J., White, Shannon, and Merrill, Stephen J. An ensemble model of QSAR tools for regulatory risk assessment. United States: N. p., 2016. Web. doi:10.1186/s13321-016-0164-0.
Pradeep, Prachi, Povinelli, Richard J., White, Shannon, & Merrill, Stephen J. An ensemble model of QSAR tools for regulatory risk assessment. United States. doi:10.1186/s13321-016-0164-0.
Pradeep, Prachi, Povinelli, Richard J., White, Shannon, and Merrill, Stephen J. 2016. "An ensemble model of QSAR tools for regulatory risk assessment". United States. doi:10.1186/s13321-016-0164-0. https://www.osti.gov/servlets/purl/1375955.
@article{osti_1375955,
title = {An ensemble model of QSAR tools for regulatory risk assessment},
author = {Pradeep, Prachi and Povinelli, Richard J. and White, Shannon and Merrill, Stephen J.},
abstractNote = {Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. In conclusion, this feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.},
doi = {10.1186/s13321-016-0164-0},
journal = {Journal of Cheminformatics},
number = 1,
volume = 8,
place = {United States},
year = 2016,
month = 9
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share:
  • As indicated in ICH M7 draft guidance, in silico predictive tools including statistically-based QSARs and expert analysis may be used as a computational assessment for bacterial mutagenicity for the qualification of impurities in pharmaceuticals. To address this need, we developed and validated a QSAR model to predict Salmonella t. mutagenicity (Ames assay outcome) of pharmaceutical impurities using Prous Institute's Symmetry℠, a new in silico solution for drug discovery and toxicity screening, and the Mold2 molecular descriptor package (FDA/NCTR). Data was sourced from public benchmark databases with known Ames assay mutagenicity outcomes for 7300 chemicals (57% mutagens). Of these data, 90%more » was used to train the model and the remaining 10% was set aside as a holdout set for validation. The model's applicability to drug impurities was tested using a FDA/CDER database of 951 structures, of which 94% were found within the model's applicability domain. The predictive performance of the model is acceptable for supporting regulatory decision-making with 84 ± 1% sensitivity, 81 ± 1% specificity, 83 ± 1% concordance and 79 ± 1% negative predictivity based on internal cross-validation, while the holdout dataset yielded 83% sensitivity, 77% specificity, 80% concordance and 78% negative predictivity. Given the importance of having confidence in negative predictions, an additional external validation of the model was also carried out, using marketed drugs known to be Ames-negative, and obtained 98% coverage and 81% specificity. Additionally, Ames mutagenicity data from FDA/CFSAN was used to create another data set of 1535 chemicals for external validation of the model, yielding 98% coverage, 73% sensitivity, 86% specificity, 81% concordance and 84% negative predictivity. - Highlights: • A new in silico QSAR model to predict Ames mutagenicity is described. • The model is extensively validated with chemicals from the FDA and the public domain. • Validation tests show desirable high sensitivity and high negative predictivity. • The model predicted 14 reportedly difficult to predict drug impurities with accuracy. • The model is suitable to support risk evaluation of potentially mutagenic compounds.« less
  • Scientists are continually trying to improve cancer risk assessment by incorporating new information on the cancer process and on how different carcinogens affect the process. Preferably, cancer risk assessments would be based on epidemiological studies, studies that link actual human cancer cases with human exposure to specific agents. More often than not, however, such information is not available, and risk assessments are made by extrapolating from experimental results on laboratory animals to the human situation. Many uncertainties are inherent in both approaches. Epidemiology studies depend heavily on accurate assessments of human exposure. Generally, these assessments rely on external exposure measurements:more » this doesn't account for what happens to a carcinogen once it enters the body, a factor that can greatly influence the quantitative exposure-to-tumor relationship. Extrapolations from laboratory data entail even greater uncertainties. Recent scientific discoveries are helping the authors to refine their estimates of human exposure to carcinogens and to better understand the mechanisms of action of carcinogens both in experimental animals and in humans.« less
  • Implementation of the Toxic Substances Control Act of 1977 creates the need to reliably establish testing priorities because laboratory resources are limited and the number of industrial chemicals requiring evaluation is overwhelming. The use of quantitative structure activity relationship (QSAR) models as rapid and predictive screening tools to select more potentially hazardous chemicals for in-depth laboratory evaluation has been proposed. Further implementation and refinement of quantitative structure-toxicity relationships in aqueous toxicology and hazard assessment requires the development of a mode-of-action database. With such a database, a qualitative structure-activity relationship can be formulated to assign the proper mode of action, andmore » respective QSAR, to a given chemical structure. In this review, the development of fish acute toxicity syndromes (FATS), which are toxic-response sets based on various behavioral and physiological-biochemical measurements, and their projected use in the mode-of-action database are outlined. Using behavioral parameters monitored in the fathead minnow during acute toxicity testing, FATS associated with acetylcholinesterase (AChE) inhibitors and narcotics could be reliably predicted. However, compounds classified as oxidative phosphorylation uncouplers or stimulants could not be resolved. Refinement of this approach by using respiratory-cardiovascular responses in the rainbow trout, enabled FATS associated with AChE inhibitors, convulsants, narcotics, respiratory blockers, respiratory membrane irritants, and uncouplers to be correctly predicted.« less