Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

RESEARCH ARTICLE Open Access Learning sparse models for a dynamic Bayesian

Summary: RESEARCH ARTICLE Open Access
Learning sparse models for a dynamic Bayesian
network classifier of protein secondary structure
Zafer Aydin1
, Ajit Singh2
, Jeff Bilmes2
and William S Noble1,3*
Background: Protein secondary structure prediction provides insight into protein function and is a valuable
preliminary step for predicting the 3D structure of a protein. Dynamic Bayesian networks (DBNs) and support
vector machines (SVMs) have been shown to provide state-of-the-art performance in secondary structure
prediction. As the size of the protein database grows, it becomes feasible to use a richer model in an effort to
capture subtle correlations among the amino acids and the predicted labels. In this context, it is beneficial to
derive sparse models that discourage over-fitting and provide biological insight.
Results: In this paper, we first show that we are able to obtain accurate secondary structure predictions. Our per-
residue accuracy on a well established and difficult benchmark (CB513) is 80.3%, which is comparable to the state-
of-the-art evaluated on this dataset. We then introduce an algorithm for sparsifying the parameters of a DBN. Using
this algorithm, we can automatically remove up to 70-95% of the parameters of a DBN while maintaining the same
level of predictive accuracy on the SD576 set. At 90% sparsity, we are able to compute predictions three times
faster than a fully dense model evaluated on the SD576 set. We also demonstrate, using simulated data, that the


Source: Aydin, Zafer - Department of Genome Sciences, University of Washington at Seattle


Collections: Biology and Medicine