Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification.

Webb-Robertson, Bobbie-Jo M; Matzke, Melissa M; Oehmen, Christopher S

Title: Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification.

Conference · Thu Feb 26 00:00:00 EST 2009

OSTI ID:948756

Webb-Robertson, Bobbie-Jo M; Matzke, Melissa M; Oehmen, Christopher S

Reducing the dimension of vectors used in training support vector machines (SVMs) results in a proportional speedup in training time. For large-scale problems this can make the difference between tractable and intractable training tasks. However, it is critical that classifiers trained on reduced datasets perform as reliably as their counterparts trained on high-dimensional data. We assessed principal component analysis (PCA) and sequential project pursuit (SPP) as dimension reduction strategies in the biology application of classifying proteins into well-defined functional ‘families’ (SVM-based protein family classification) by their impact on run-time, sensitivity and selectivity. Homology vectors of 4352 elements were reduced to approximately 2% of the original data size without significantly affecting accuracy using PCA and SPP, while leading to approximately a 28-fold speedup in run-time.

OSTI does not have a digital full text copy available. For more information, please see document availability, search WorldCat, or search Google Scholar.

Cite

Export

Save

Research Organization:: Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-76RL01830

OSTI ID:: 948756

Report Number(s):: PNNL-SA-60288; TRN: US200907%%227

Resource Relation:: Conference: The Seventh International Conference on Machine Learning and Applications , 457-462

Country of Publication:: United States

Language:: English

Similar Records

Effective Dimension Reduction Using Sequential Projection Pursuit On Gene Expression Data for Cancer Classification

Conference · Wed Jun 23 00:00:00 EDT 2004 · OSTI ID:948756

Webb-Robertson, Bobbie-Jo M; Havre, Susan L

Benders Cut Classification via Support Vector Machines for Solving Two-Stage Stochastic Programs

Journal Article · Wed Mar 18 00:00:00 EDT 2020 · INFORMS Journal on Optimization · OSTI ID:948756

Jia, Huiwen; Shen, Siqian

A new classification scheme of plastic wastes based upon recycling labels

Journal Article · Thu Jan 15 00:00:00 EST 2015 · Waste Management · OSTI ID:948756

Özkan, Kemal; Ergin, Semih; Işık, Şahin; +1 more

Related Subjects

59 BASIC BIOLOGICAL SCIENCES
ACCURACY
BIOLOGY
CLASSIFICATION
DIMENSIONS
FUNCTIONALS
LEARNING
PROTEINS
SENSITIVITY
TRAINING
VECTORS

Title: Dimension Reduction via Unsupervised Learning Yields Significant Computational Improvements for Support Vector Machine Based Protein Family Classification.

Citation Formats

Similar Records

Related Subjects