Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Physicochemical property distributions for accurate and rapid pairwise protein homology detection

Journal Article · · BMC Bioinformatics
 [1];  [2];  [3]
  1. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Computational Biology and Bioinformatics; DOE/OSTI
  2. Gonzaga Univ., Spokane, WA (United States). Dept. of Chemistry
  3. Pacific Northwest National Lab. (PNNL), Richland, WA (United States). Computational Biology and Bioinformatics
Background: The challenge of remote homology detection is that many evolutionarily related sequences have very little similarity at the amino acid level. Kernel-based discriminative methods, such as support vector machines (SVMs), that use vector representations of sequences derived from sequence properties have been shown to have superior accuracy when compared to traditional approaches for the task of remote homology detection. Results: We introduce a new method for feature vector representation based on the physicochemical properties of the primary protein sequence. A distribution of physicochemical property scores are assembled from 4-mers of the sequence and normalized based on the null distribution of the property over all possible 4-mers. With this approach there is little computational cost associated with the transformation of the protein into feature space, and overall performance in terms of remote homology detection is comparable with current state-of-the-art methods. We demonstrate that the features can be used for the task of pairwise remote homology detection with improved accuracy versus sequence-based methods such as BLAST and other feature-based methods of similar computational cost. Conclusions: A protein feature method based on physicochemical properties is a viable approach for extracting features in a computationally inexpensive manner while retaining the sensitivity of SVM protein homology detection. Furthermore, identifying features that can be used for generic pairwise homology detection in lieu of family-based homology detection is important for applications such as large database searches and comparative genomics.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC05-76RL01830; AC06-76RL01830
OSTI ID:
1626267
Journal Information:
BMC Bioinformatics, Journal Name: BMC Bioinformatics Journal Issue: 1 Vol. 11; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (32)

Remote homolog detection using local sequence-structure correlations journal August 2004
Identification of common molecular subsequences journal March 1981
Remote protein homology detection using recurrence quantification analysis and amino acid physicochemical properties journal May 2008
Basic local alignment search tool journal October 1990
Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships journal December 2003
Remote homology detection: a motif based approach journal July 2003
Efficient remote homology detection using local structure journal November 2003
Protein homology detection using string alignment kernels journal February 2004
Motif-based protein ranking by network propagation journal August 2005
Profile-based direct kernels for remote homology detection and fold recognition journal September 2005
Application of latent semantic analysis to protein remote homology detection journal November 2005
Remote homology detection based on oligomer distances journal July 2006
SVM-HUSTLE--an iterative semi-supervised machine learning approach for pairwise protein remote homology detection journal February 2008
Probabilistic multi-class multi-kernel learning: on protein fold recognition and remote homology detection journal March 2008
Augmented training of hidden Markov models to recognize remote homologs via simulated evolution journal April 2009
AAindex: amino acid index database, progress report 2008 journal December 2007
Identifying remote protein homologs by network propagation journal October 2005
The Spectrum Kernel: a String Kernel for svm Protein Classification conference November 2011
Combining classifiers for improved classification of proteins from sequence or structure journal January 2008
A discriminative method for protein remote homology detection and fold recognition combining Top-n-grams and latent semantic analysis journal December 2008
Measuring Global Credibility with Application to Local Sequence Alignment journal May 2008
Protein Ranking by Semi-Supervised Network Propagation text January 2006
Peptide/protein structure analysis using the chemical shift index method: Upfield α-CH values reveal dynamic helices and αL sites journal April 1992
Basic local alignment search tool journal October 1990
SVM-BALSA: Remote homology detection based on Bayesian sequence alignment journal December 2005
A feature vector integration approach for a generalized support vector machine pairwise homology algorithm journal December 2008
Mismatch string kernels for discriminative protein classification journal January 2004
Fast model-based protein homology detection without alignment journal May 2007
RANKPROP: a web server for protein remote homology detection journal November 2008
SIMPRO: simple protein homology detection method by using indirect signals journal January 2009
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs journal September 1997
Protein Ranking by Semi-Supervised Network Propagation journal March 2006

Cited By (6)

Protein remote homology detection based on bidirectional long short-term memory journal October 2017
Using Amino Acid Physicochemical Distance Transformation for Fast Protein Remote Homology Detection journal September 2012
An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures journal February 2013
A comprehensive review and comparison of different computational methods for protein remote homology detection journal November 2016
BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches journal December 2017
Physico-Chemically Weighted Kernel for SVM Protein Classification journal August 2012

Similar Records

Integrating Subcellular Location for Improving Machine Learning Models of Remote Homology Detection in Eukaryotic Organisms
Journal Article · Thu Feb 22 23:00:00 EST 2007 · Computational Biology and Chemistry, 31(2):138-142 · OSTI ID:903458

SVM-Hustle - An iterative semi-supervised machine learning approach for pairwise protein remote homology detection
Journal Article · Sat Mar 15 00:00:00 EDT 2008 · Bioinformatics, 24(6):783-790 · OSTI ID:985035

SVM-BALSA: Remote Homology Detection based on Bayesian Sequence Alignment
Journal Article · Wed Nov 09 23:00:00 EST 2005 · Computational Biology and Chemistry, 29(6):440-3 · OSTI ID:878675