Physicochemical property distributions for accurate and rapid pairwise protein homology detection
Journal Article
·
· BMC Bioinformatics
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Computational Biology and Bioinformatics; DOE/OSTI
- Gonzaga Univ., Spokane, WA (United States). Dept. of Chemistry
- Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Computational Biology and Bioinformatics
Background: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results: We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions: A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.
- Research Organization:
- Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
- Grant/Contract Number:
- AC05-76RL01830; AC06-76RL01830
- OSTI ID:
- 1626267
- Journal Information:
- BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 11; ISSN 1471-2105
- Publisher:
- BioMed CentralCopyright Statement
- Country of Publication:
- United States
- Language:
- English
Similar Records
Integrating Subcellular Location for Improving Machine Learning Models of Remote Homology Detection in Eukaryotic Organisms
SVM-Hustle - An iterative semi-supervised machine learning approach for pairwise protein remote homology detection
SVM-BALSA: Remote Homology Detection based on Bayesian Sequence Alignment
Journal Article
·
Thu Feb 22 23:00:00 EST 2007
· Computational Biology and Chemistry, 31(2):138-142
·
OSTI ID:903458
SVM-Hustle - An iterative semi-supervised machine learning approach for pairwise protein remote homology detection
Journal Article
·
Sat Mar 15 00:00:00 EDT 2008
· Bioinformatics, 24(6):783-790
·
OSTI ID:985035
SVM-BALSA: Remote Homology Detection based on Bayesian Sequence Alignment
Journal Article
·
Wed Nov 09 23:00:00 EST 2005
· Computational Biology and Chemistry, 29(6):440-3
·
OSTI ID:878675