Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
20% 25% 30% 40% 50% 60% 70% 80% 90% # Prot 4,343 5,458 6,824 9,047 10,384 11,324 12,031 12,582 13,333
 

Summary: 20% 25% 30% 40% 50% 60% 70% 80% 90%
# Prot 4,343 5,458 6,824 9,047 10,384 11,324 12,031 12,582 13,333
# AA 870,961 1,140,596 1,457,834 1,969,045 2,277,416 2,493,278 2,650,209 2,764,111 2,918,025
Table 1: Data sets. The table lists, for each percent identity threshold, the number of proteins and the
number of amino acids in the corresponding data set.
Label Class Percent
H Helix 33.23%
E Interior strand (2 bridges) 8.31%
e Edge strand (1 bridge) 14.38%
b Bulge (0 bridges) 1.02%
L Loop 43.06%
Table 2: Secondary structure labels. The table lists each of the secondary structure labels used in this
study, as well as the percentage of amino acids that are assigned that label in the 90% identity data set.
1 Datasets
In this project, we generated culled PDB lists from the PISCES server [6, 7] for use in training and validating
our models. We considered nine percentage identity cutoffs: 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, and
90%. Using PISCES, we set the resolution cutoff at 2.5 A angstroms and the R-value cutoff at 1.0, and we
eliminated non-X-ray and C-only structures. PISCES also enables us to remove short and long chains. We
removed chains that have less than 40 and more than 500 amino acids. We replaced the chemically modified
residues (i.e., the ones annotated as "X" by the DSSP algorithm [2]) with their unmodified versions in the

  

Source: Aydin, Zafer - Department of Genome Sciences, University of Washington at Seattle

 

Collections: Biology and Medicine