DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Structure-aware annotation of leucine-rich repeat domains

Journal Article · · PLoS Computational Biology (Online)

Protein domain annotation is typically done by predictive models such as HMMs trained on sequence motifs. However, sequence-based annotation methods are prone to error, particularly in calling domain boundaries and motifs within them. These methods are limited by a lack of structural information accessible to the model. With the advent of deep learning-based protein structure prediction, existing sequenced-based domain annotation methods can be improved by taking into account the geometry of protein structures. We develop dimensionality reduction methods to annotate repeat units of the Leucine Rich Repeat solenoid domain. The methods are able to correct mistakes made by existing machine learning-based annotation tools and enable the automated detection of hairpin loops and structural anomalies in the solenoid. The methods are applied to 127 predicted structures of LRR-containing intracellular innate immune proteins in the model plant Arabidopsis thaliana and validated against a benchmark dataset of 172 manually-annotated LRR domains.

Sponsoring Organization:
USDOE
Grant/Contract Number:
SC0020347
OSTI ID:
2477469
Journal Information:
PLoS Computational Biology (Online), Journal Name: PLoS Computational Biology (Online) Journal Issue: 11 Vol. 20; ISSN 1553-7358
Publisher:
Public Library of Science (PLoS)Copyright Statement
Country of Publication:
United States
Language:
English

References (32)

The arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome: Tair: Making and Mining the “Gold Standard” Plant Genome journal August 2015
Algebraic Graph Theory book January 2001
Sparse Circular Coordinates via Principal $\mathbb {Z}$-Bundles conference June 2020
Cohomological learning of periodic motion journal March 2015
Control of repeat-protein curvature by computational protein design journal January 2015
Highly accurate protein structure prediction with AlphaFold journal July 2021
The Protein Data Bank journal January 2000
The InterPro protein families and domains database: 20 years on journal November 2020
AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences journal November 2023
HMMER web server: interactive sequence similarity searching journal May 2011
Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions journal April 2013
Plant NLR diversity: the known unknowns of pan-NLRomes journal February 2021
Analysis of intraspecies diversity reveals a subset of highly variable plant immune receptors and predicts their binding sites journal January 2021
Integrity of the Post-LRR Domain Is Required for TIR-NB-LRR Function journal March 2021
A solution for the best rotation to relate two sets of vectors journal September 1976
DUFs: families in search of function journal March 2010
A theory of multiscale, curvature-based shape representation for planar curves journal January 1992
Topological Eulerian Synthesis of Slow Motion Periodic Videos conference October 2018
The leucine‐rich repeat signaling scaffolds Shoc2 and Erbin: cellular mechanism and role in disease journal July 2020
The leucine-rich repeat domain in plant innate immunity: a wealth of possibilities journal February 2009
Intracellular innate immune surveillance devices in plants and animals journal December 2016
A Least Squares Estimate of Satellite Attitude journal July 1965
Persistent cohomology and circular coordinates conference June 2009
RingIt journal May 2015
Use of the Hough transformation to detect lines and curves in pictures journal January 1972
Evolution of Plant NLRs: From Natural History to Precise Modifications journal April 2020
Decoding of Neural Data Using Cohomological Feature Extraction journal January 2019
Dominant integration locus drives continuous diversification of plant immune receptors with exogenous domain fusions journal February 2018
TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions journal July 2017
A topological data analytic approach for discovering biophysical signatures in protein dynamics journal May 2022
LRRpredictor—A New LRR Motif Detection Method for Irregular Motifs of Plant NLR Proteins Using an Ensemble of Classifiers journal March 2020
Leucine-rich repeat (LRR) proteins: Integrators of pattern recognition and signaling in immunity journal September 2011