skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification.

Conference ·
OSTI ID:948756

Reducing the dimension of vectors used in training support vector machines (SVMs) results in a proportional speedup in training time. For large-scale problems this can make the difference between tractable and intractable training tasks. However, it is critical that classifiers trained on reduced datasets perform as reliably as their counterparts trained on high-dimensional data. We assessed principal component analysis (PCA) and sequential project pursuit (SPP) as dimension reduction strategies in the biology application of classifying proteins into well-defined functional ‘families’ (SVM-based protein family classification) by their impact on run-time, sensitivity and selectivity. Homology vectors of 4352 elements were reduced to approximately 2% of the original data size without significantly affecting accuracy using PCA and SPP, while leading to approximately a 28-fold speedup in run-time.

Research Organization:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
948756
Report Number(s):
PNNL-SA-60288; TRN: US200907%%227
Resource Relation:
Conference: The Seventh International Conference on Machine Learning and Applications , 457-462
Country of Publication:
United States
Language:
English