DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets

Journal Article · · Biochimica et Biophysica Acta - General Subjects
ORCiD logo [1];  [2];  [3];  [2]; ORCiD logo [1];  [3]
  1. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Fayetteville State Univ., Fayettville, NC (United States)
  3. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Selecting peptides that bind strongly to the major histocompatibility complex (MHC) for inclusion in a vaccine has therapeutic potential for infections and tumors. Machine learning models trained on sequence data exist for peptide:MHC (p:MHC) binding predictions. Here, we train support vector machine classifier (SVMC) models on physicochemical sequence-based and structure-based descriptor sets to predict peptide binding to a well-studied model mouse MHC I allele, H-2Db. Recursive feature elimination and two-way forward feature selection were also performed. Although low on sensitivity compared to the current state-of-the-art algorithms, models based on physicochemical descriptor sets achieve specificity and precision comparable to the most popular sequence-based algorithms. The best-performing model is a hybrid descriptor set containing both sequence-based and structure-based descriptors. Interestingly, close to half of the physicochemical sequence-based descriptors remaining in the hybrid model were properties of the anchor positions, residues 5 and 9 in the peptide sequence. In contrast, residues flanking position 5 make little to no residue-specific contribution to the binding affinity prediction. The results suggest that machine-learned models incorporating both sequence-based descriptors and structural data may provide information on specific physicochemical properties determining binding affinities.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1606825
Journal Information:
Biochimica et Biophysica Acta - General Subjects, Journal Name: Biochimica et Biophysica Acta - General Subjects Journal Issue: 4 Vol. 1864; ISSN 0304-4165
Publisher:
ElsevierCopyright Statement
Country of Publication:
United States
Language:
English