DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Machine Learning Framework for Conotoxin Class and Molecular Target Prediction

Journal Article · · Toxins

Conotoxins are small and highly potent neurotoxic peptides derived from the venom of marine cone snails which have captured the interest of the scientific community due to their pharmacological potential. These toxins display significant sequence and structure diversity, which results in a wide range of specificities for several different ion channels and receptors. Despite the recognized importance of these compounds, our ability to determine their binding targets and toxicities remains a significant challenge. Predicting the target receptors of conotoxins, based solely on their amino acid sequence, remains a challenge due to the intricate relationships between structure, function, target specificity, and the significant conformational heterogeneity observed in conotoxins with the same primary sequence. We have previously demonstrated that the inclusion of post-translational modifications, collisional cross sections values, and other structural features, when added to the standard primary sequence features, improves the prediction accuracy of conotoxins against non-toxic and other toxic peptides across varied datasets and several different commonly used machine learning classifiers. Here, we present the effects of these features on conotoxin class and molecular target predictions, in particular, predicting conotoxins that bind to nicotinic acetylcholine receptors (nAChRs). We also demonstrate the use of the Synthetic Minority Oversampling Technique (SMOTE)-Tomek in balancing the datasets while simultaneously making the different classes more distinct by reducing the number of ambiguous samples which nearly overlap between the classes. In predicting the alpha, mu, and omega conotoxin classes, the SMOTE-Tomek PCA PLR model, using the combination of the SS and P feature sets establishes the best performance with an overall accuracy (OA) of 95.95%, with an average accuracy (AA) of 93.04%, and an f1 score of 0.959. Using this model, we obtained sensitivities of 98.98%, 89.66%, and 90.48% when predicting alpha, mu, and omega conotoxin classes, respectively. Similarly, in predicting conotoxins that bind to nAChRs, the SMOTE-Tomek PCA SVM model, which used the collisional cross sections (CCSs) and the P feature sets, demonstrated the highest performance with 91.3% OA, 91.32% AA, and an f1 score of 0.9131. The sensitivity when predicting conotoxins that bind to nAChRs is 91.46% with a 91.18% sensitivity when predicting conotoxins that do not bind to nAChRs.

Research Organization:
Los Alamos National Laboratory (LANL), Los Alamos, NM (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program; USDOE National Nuclear Security Administration (NNSA)
Grant/Contract Number:
89233218CNA000001
OSTI ID:
2477164
Report Number(s):
LA-UR--24-28721
Journal Information:
Toxins, Journal Name: Toxins Journal Issue: 11 Vol. 16; ISSN 2072-6651
Publisher:
MDPICopyright Statement
Country of Publication:
United States
Language:
English

References (33)

Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features journal December 1983
Structure determination of the three disulfide bond isomers of α-conotoxin GI: a model for the role of disulfide bonds in structural stability 1 1Edited by P. E. Wright journal May 1998
The Elements of Statistical Learning book January 2009
Solution structure and proposed binding mechanism of a novel potassium channel toxin κ-conotoxin PVIIA journal December 1997
NMR structure determination of α-conotoxin BuIA, a novel neuronal nicotinic acetylcholine receptor antagonist with an unusual 4/4 disulfide scaffold journal November 2006
Prediction of the types of ion channel-targeted conotoxins based on radial basis function network journal March 2013
Conopeptide characterization and classifications: An analysis using ConoServer journal July 2010
Structural and Dynamic Characterization of ω-Conotoxin MVIIA:  The Binding Loop Exhibits Slow Conformational Exchange, journal March 2000
Discovery, Synthesis, and Structure–Activity Relationships of Conotoxins journal April 2014
Machine Learning in Complex Organic Mixtures: Applying Domain Knowledge Allows for Meaningful Performance with Small Data Sets journal July 2024
Random Forests journal January 2001
Announcing the worldwide Protein Data Bank journal December 2003
Crystal structure of nicotinic acetylcholine receptor homolog AChBP in complex with an α-conotoxin PnIA variant
  • Celie, Patrick H. N.; Kasheverov, Igor E.; Mordvintsev, Dmitry Y.
  • Nature Structural & Molecular Biology, Vol. 12, Issue 7 https://doi.org/10.1038/nsmb951
journal June 2005
Structure of human Cav2.2 channel blocked by the painkiller ziconotide journal July 2021
Principal component analysis journal January 2014
A New Level of Conotoxin Diversity, a Non-native Disulfide Bond Connectivity in α-Conotoxin AuIB Reduces Structural Definition but Increases Biological Activity journal October 2002
Toxicity Testing in the 21st Century: A Vision and a Strategy journal June 2010
Conotoxins as selective inhibitors of neuronal ion channels, receptors and transporters journal February 2004
Biological Magnetic Resonance Data Bank journal December 2022
A series of PDB-related databanks for everyday needs journal October 2014
Analysis of Dimensionality Reduction Techniques on Big Data journal January 2020
Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data conference May 2016
Molecular basis for pore blockade of human Na + channel Na v 1.2 by the μ-conotoxin KIIIA journal February 2019
Redundant feature elimination for multi-class problems conference January 2004
XGBoost: A Scalable Tree Boosting System conference January 2016
Predicting the Types of Ion Channel-Targeted Conotoxins Based on AVC-SVM Model journal January 2017
Structure of α-conotoxin BuIA: influences of disulfide connectivity on structural dynamics journal January 2007
SMOTE: Synthetic Minority Over-sampling Technique journal January 2002
Ziconotide: a review of its pharmacology and use in the treatment of pain journal February 2007
Machine Learning Model for Identifying Antioxidant Proteins Using Features Calculated from Primary Sequences journal October 2020
Recent Advances in Conotoxin Classification by Using Machine Learning Methods journal June 2017
Conotoxin Prediction: New Features to Increase Prediction Accuracy journal November 2023
Protein function prediction with gene ontology: from traditional to deep learning models journal August 2021