skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit

Abstract

A fundamental problem in analysis of highly multivariate spectral or chromatographic data is reduction of dimensionality. Principal components analysis (PCA), concerned with explaining the variance-covariance structure of the data, is a commonly used approach to dimension reduction. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced. Designed to elicit clustering tendencies in the data, SPP may be more appropriate when performing clustering or classification analysis. However, the existing genetic algorithm (GA) implementation of SPP has two shortcomings, computation time and inability to determine the number of factors necessary to explain the majority of the structure in the data. We address both these shortcomings. First, we introduce a new SPP algorithm, a random scan sampling algorithm (RSSA), that significantly reduces computation time. We compare the computational burden of the RSS and GA implementation for SPP on a dataset containing Raman spectra of twelve organic compounds. Second, we propose a Bayes factor criterion, BFC, as an effective measure for selecting the number of factors needed to explain the majority of the structure in the data. We compare SPP to PCA on two datasets varying in type, size, and difficulty; in both cases SPP achieves a higher accuracy withmore » a lower number of latent variables.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
15020557
Report Number(s):
PNNL-SA-41907
TRN: US200521%%172
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article
Journal Name:
Chemometrics and Intelligent Laboratory Systems
Additional Journal Information:
Journal Volume: 77; Journal Issue: 1-2
Country of Publication:
United States
Language:
English
Subject:
59 BASIC BIOLOGICAL SCIENCES; ACCURACY; ALGORITHMS; CLASSIFICATION; DIMENSIONS; GENETICS; IMPLEMENTATION; OPTIMIZATION; ORGANIC COMPOUNDS; RAMAN SPECTRA; SAMPLING; statistics; multivariate; classification; clustering; Bayes

Citation Formats

Webb-Robertson, Bobbie-Jo M, Jarman, Kristin H, Harvey, Scott D, Posse, Christian, and Wright, Bob W. An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit. United States: N. p., 2005. Web. doi:10.1016/j.chemolab.2004.09.014.
Webb-Robertson, Bobbie-Jo M, Jarman, Kristin H, Harvey, Scott D, Posse, Christian, & Wright, Bob W. An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit. United States. https://doi.org/10.1016/j.chemolab.2004.09.014
Webb-Robertson, Bobbie-Jo M, Jarman, Kristin H, Harvey, Scott D, Posse, Christian, and Wright, Bob W. 2005. "An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit". United States. https://doi.org/10.1016/j.chemolab.2004.09.014.
@article{osti_15020557,
title = {An improved optimization algorithm and Bayes factor termination criterion for sequential projection pursuit},
author = {Webb-Robertson, Bobbie-Jo M and Jarman, Kristin H and Harvey, Scott D and Posse, Christian and Wright, Bob W},
abstractNote = {A fundamental problem in analysis of highly multivariate spectral or chromatographic data is reduction of dimensionality. Principal components analysis (PCA), concerned with explaining the variance-covariance structure of the data, is a commonly used approach to dimension reduction. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced. Designed to elicit clustering tendencies in the data, SPP may be more appropriate when performing clustering or classification analysis. However, the existing genetic algorithm (GA) implementation of SPP has two shortcomings, computation time and inability to determine the number of factors necessary to explain the majority of the structure in the data. We address both these shortcomings. First, we introduce a new SPP algorithm, a random scan sampling algorithm (RSSA), that significantly reduces computation time. We compare the computational burden of the RSS and GA implementation for SPP on a dataset containing Raman spectra of twelve organic compounds. Second, we propose a Bayes factor criterion, BFC, as an effective measure for selecting the number of factors needed to explain the majority of the structure in the data. We compare SPP to PCA on two datasets varying in type, size, and difficulty; in both cases SPP achieves a higher accuracy with a lower number of latent variables.},
doi = {10.1016/j.chemolab.2004.09.014},
url = {https://www.osti.gov/biblio/15020557}, journal = {Chemometrics and Intelligent Laboratory Systems},
number = 1-2,
volume = 77,
place = {United States},
year = {Sat May 28 00:00:00 EDT 2005},
month = {Sat May 28 00:00:00 EDT 2005}
}