skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes

Journal Article · · Biochemical and Biophysical Research Communications
 [1];  [2]; ;  [3];  [1];  [2];  [1]
  1. Institute for Neuro- and Bioinformatics, University of Luebeck, 23538 Luebeck (Germany)
  2. School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798 (Singapore)
  3. Centre for Structural and Cell Biology in Medicine, Institute of Biology, University of Luebeck, 23538 Luebeck (Germany)

Eukaryotic protein secretion generally occurs via the classical secretory pathway that traverses the ER and Golgi apparatus. Secreted proteins usually contain a signal sequence with all the essential information required to target them for secretion. However, some proteins like fibroblast growth factors (FGF-1, FGF-2), interleukins (IL-1 alpha, IL-1 beta), galectins and thioredoxin are exported by an alternative pathway. This is known as leaderless or non-classical secretion and works without a signal sequence. Most computational methods for the identification of secretory proteins use the signal peptide as indicator and are therefore not able to identify substrates of non-classical secretion. In this work, we report a random forest method, SPRED, to identify secretory proteins from protein sequences irrespective of N-terminal signal peptides, thus allowing also correct classification of non-classical secretory proteins. Training was performed on a dataset containing 600 extracellular proteins and 600 cytoplasmic and/or nuclear proteins. The algorithm was tested on 180 extracellular proteins and 1380 cytoplasmic and/or nuclear proteins. We obtained 85.92% accuracy from training and 82.18% accuracy from testing. Since SPRED does not use N-terminal signals, it can detect non-classical secreted proteins by filtering those secreted proteins with an N-terminal signal by using SignalP. SPRED predicted 15 out of 19 experimentally verified non-classical secretory proteins. By scanning the entire human proteome we identified 566 protein sequences potentially undergoing non-classical secretion. The dataset and standalone version of the SPRED software is available at (http://www.inb.uni-luebeck.de/tools-demos/spred/spred).

OSTI ID:
22202325
Journal Information:
Biochemical and Biophysical Research Communications, Vol. 391, Issue 3; Other Information: Copyright (c) 2009 Elsevier Science B.V., Amsterdam, The Netherlands, All rights reserved.; Country of input: International Atomic Energy Agency (IAEA); ISSN 0006-291X
Country of Publication:
United States
Language:
English