Prediction of protein tertiary structure from sequences using a very large back-propagation neural network
- Univ. of Minnesota, Minneapolis, MN (United States)
We have implemented large scale back-propagation neural networks on a 544 node Connection Machine, CM-5, using the C language in MIMD mode. The program running on 512 processors performs backpropagation learning at 0.53 Gflops, which provides 76 million connection updates per second. We have applied the network to the prediction of protein tertiary structure from sequence information alone. A neural network with one hidden layer and 40 million connections is trained to learn the relationship between sequence and tertiary structure. The trained network yields predicted structures of some proteins on which it has not been trained given only their sequences. Presentation of the Fourier transform of the sequences accentuates periodicity in the sequence and yields good generalization with greatly increased training efficiency. Training simulations with a large, heterologous set of protein structures (111 proteins from CM-5 time) to solutions with under 2% RMS residual error within the training set (random responses give an RMS error of about 20%). Presentation of 15 sequences of related proteins in a testing set of 24 proteins yields predicted structures with less than 8% RMS residual error, indicating good apparent generalization.
- OSTI ID:
- 54405
- Report Number(s):
- DOE/ER/25151--1-Vol.1; CONF-930331--Vol.1
- Country of Publication:
- United States
- Language:
- English
Similar Records
A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
Classification of acoustic-emission waveforms for nondestructive evaluation using neural networks