Summary: Bayesian Models and Algorithms
for Protein -Sheet Prediction
Zafer Aydin, Yucel Altunbasak, and Hakan Erdogan
Abstract--Prediction of the 3D structure greatly benefits from the information related to secondary structure, solvent accessibility, and
nonlocal contacts that stabilize a protein's structure. We address the problem of -sheet prediction defined as the prediction of
-strand pairings, interaction types (parallel or antiparallel), and -residue interactions (or contact maps). We introduce a Bayesian
approach for proteins with six or less -strands in which we model the conformational features in a probabilistic framework by
combining the amino acid pairing potentials with a priori knowledge of -strand organizations. To select the optimum -sheet
architecture, we significantly reduce the search space by heuristics that enforce the amino acid pairs with strong interaction potentials.
In addition, we find the optimum pairwise alignment between -strands using dynamic programming in which we allow any number of
gaps in an alignment to model -bulges more effectively. For proteins with more than six -strands, we first compute -strand pairings
using the BetaPro method. Then, we compute gapped alignments of the paired -strands and choose the interaction types and
-residue pairings with maximum alignment scores. We performed a 10-fold cross-validation experiment on the BetaSheet916 set and
obtained significant improvements in the prediction accuracy.
Index Terms--Protein -sheets, open -sheets, -sheet prediction, contact map prediction, Bayesian modeling.
A -sheet is a set of -strand segments, which are
involved in hydrogen bonding interactions. The
association of -sheets has been implicated in the formation