DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: HLA-Clus: HLA class I clustering based on 3D structure

Journal Article · · BMC Bioinformatics
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [3]
  1. Univ. of Tennessee, Knoxville, TN (United States)
  2. Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
  3. Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

In a previous paper, we classified populated HLA class I alleles into supertypes and subtypes based on the similarity of 3D landscape of peptide binding grooves, using newly defined structure distance metric and hierarchical clustering approach. Compared to other approaches, our method achieves higher correlation with peptide binding specificity, intra-cluster similarity (cohesion), and robustness. Here we introduce HLA-Clus, a Python package for clustering HLA Class I alleles using the method we developed recently and describe additional features including a new nearest neighbor clustering method that facilitates clustering based on user-defined criteria. The HLA-Clus pipeline includes three stages: First, HLA Class I structural models are coarse grained and transformed into clouds of labeled points. Second, similarities between alleles are determined using a newly defined structure distance metric that accounts for spatial and physicochemical similarities. Finally, alleles are clustered via hierarchical or nearest-neighbor approaches. We also interfaced HLA-Clus with the peptide:HLA affinity predictor MHCnuggets. By using the nearest neighbor clustering method to select optimal allele-specific deep learning models in MHCnuggets, the average accuracy of peptide binding prediction of rare alleles was improved. The HLA-Clus package offers a solution for characterizing the peptide binding specificities of a large number of HLA alleles. This method can be applied in HLA functional studies, such as the development of peptide affinity predictors, disease association studies, and HLA matching for grafting. HLA-Clus is freely available at our GitHub repository (https://github.com/yshen25/HLA-Clus).

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1975334
Journal Information:
BMC Bioinformatics, Vol. 24, Issue 1; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (23)

Advantages to being different journal August 2004
MHCcluster, a method for functional clustering of MHC molecules journal June 2013
High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets journal March 2020
Highly accurate protein structure prediction with AlphaFold journal July 2021
Nine major HLA class I supertypes account for the vast preponderance of HLA-A and -B polymorphism journal November 1999
APoc: large-scale identification of similar protein pockets journal January 2013
A geometric study of the amino acid sequence of class I HLA molecules journal September 1998
SiteEngines: recognition and comparison of binding sites and protein-protein interfaces journal July 2005
The HLA System journal September 2000
Zooming into the binding groove of HLA molecules: which positions and which substitutions change peptide binding most? journal June 2015
Definition of MHC Supertypes Through Clustering of MHC Peptide-Binding Repertoires book January 2007
In silicogrouping of peptide/HLA class I complexes using structural interaction characteristics journal November 2006
A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction journal February 2010
In Silico Identification of Supertypes for Class II MHCs journal June 2005
G-LoSA: An efficient computational tool for local structure-centric biological studies and drug design: G-LoSA journal March 2016
Definition of supertypes for HLA molecules using clustering of specificity matrices journal March 2004
Identification of a Promiscuous T-Cell Epitope inMycobacterium tuberculosisMce Proteins journal January 2002
Identifiying Human MHC Supertypes Using Bioinformatic Methods journal April 2004
ColabFold: making protein folding accessible to all journal May 2022
The MHC class I antigen presentation pathway: strategies for viral immune evasion journal October 2003
Taxonomic hierarchy of HLA class I allele sequences journal November 1999
HLA Class I Supertype Classification Based on Structural Similarity journal January 2023
Amino Acid Difference Formula to Help Explain Protein Evolution journal September 1974