skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Classification of High Speed Gas Chromatography-Mass Spectrometry Data by Principal Component Analysis Coupled with Piecewise Alignment and Feature Selection

Abstract

A useful procedure is introduced for the analysis of data obtained via gas chromatography with mass spectrometry (GC-MS) utilizing a complete mass spectrum at each retention time interval in which a mass spectrum was collected. Principal component analysis (PCA) with preprocessing by both piecewise retention time alignment and analysis of variance (ANOVA) feature selection is applied to all mass channels collected. The procedure involves concatenating all concurrently measured individual m/z chromatograms from m/z 20 to 120 for each GC-MS separation into a row vector. All of the sample row vectors are incorporated into a matrix where each row is a sample vector. This matrix is piecewise aligned and reduced by ANOVA feature selection. Application of the preprocessing steps (retention time alignment and feature selection) to all mass channels collected during the chromatographic separation allows considerably more selective chemical information to be incorporated in the PCA classification, and is the primary novelty of the report. This procedure is objective and requires no knowledge of the specific analytes of interest, as in selective ion monitoring (SIM), and does not restrict the mass spectral data used, as in both SIM and total ion current (TIC) methods. Significantly, the procedure allows for the classificationmore » of data with low resolution in the chromatographic dimension because of the added selectivity from the complete mass spectral dimension. This allows for the successful classification of data over significantly decreased chromatographic separation times, since high-speed separations can be employed. The procedure is demonstrated through the analysis of a set of four differing gasoline samples that serve as model complex samples. For comparison, the gasoline samples are analyzed by GC-MS over both ten-minute and ten-second separation times. The ten-minute GC-MS TIC data served as the benchmark analysis to compare to the ten-second data. When only alignment and feature selection was applied to the ten-second gasoline separations using GC-MS TIC data, PCA failed. PCA was successful for ten-second gasoline separations when the procedure was applied with all the m/z information. With ANOVA feature selection, chromatographic regions with Fisher Ratios greater than 1500 were retained in a new matrix and subjected to PCA yielding successful classification for the ten-second separations.« less

Authors:
; ; ; ;
Publication Date:
Research Org.:
Pacific Northwest National Lab. (PNNL), Richland, WA (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
893654
Report Number(s):
PNNL-SA-48636
400480000; TRN: US200625%%462
DOE Contract Number:  
AC05-76RL01830
Resource Type:
Journal Article
Journal Name:
Journal of Chromatography A, 1129(1):111-118
Additional Journal Information:
Journal Name: Journal of Chromatography A, 1129(1):111-118
Country of Publication:
United States
Language:
English
Subject:
02 PETROLEUM; ALIGNMENT; BENCHMARKS; CLASSIFICATION; DIMENSIONS; GAS CHROMATOGRAPHY; GASOLINE; MASS SPECTROSCOPY; MONITORING; RESOLUTION; RETENTION; SPECTROSCOPY; VECTORS; VELOCITY; alignment, gas chromatography, feature selection, principal component analysis, fuel, chemometrics, GC-MS

Citation Formats

Watson, Nathaniel E, VanWingerden, Matthew M, Pierce, Karisa M, Wright, Bob W, and Synovec, Robert E. Classification of High Speed Gas Chromatography-Mass Spectrometry Data by Principal Component Analysis Coupled with Piecewise Alignment and Feature Selection. United States: N. p., 2006. Web. doi:10.1016/j.chroma.2006.06.087.
Watson, Nathaniel E, VanWingerden, Matthew M, Pierce, Karisa M, Wright, Bob W, & Synovec, Robert E. Classification of High Speed Gas Chromatography-Mass Spectrometry Data by Principal Component Analysis Coupled with Piecewise Alignment and Feature Selection. United States. https://doi.org/10.1016/j.chroma.2006.06.087
Watson, Nathaniel E, VanWingerden, Matthew M, Pierce, Karisa M, Wright, Bob W, and Synovec, Robert E. 2006. "Classification of High Speed Gas Chromatography-Mass Spectrometry Data by Principal Component Analysis Coupled with Piecewise Alignment and Feature Selection". United States. https://doi.org/10.1016/j.chroma.2006.06.087.
@article{osti_893654,
title = {Classification of High Speed Gas Chromatography-Mass Spectrometry Data by Principal Component Analysis Coupled with Piecewise Alignment and Feature Selection},
author = {Watson, Nathaniel E and VanWingerden, Matthew M and Pierce, Karisa M and Wright, Bob W and Synovec, Robert E},
abstractNote = {A useful procedure is introduced for the analysis of data obtained via gas chromatography with mass spectrometry (GC-MS) utilizing a complete mass spectrum at each retention time interval in which a mass spectrum was collected. Principal component analysis (PCA) with preprocessing by both piecewise retention time alignment and analysis of variance (ANOVA) feature selection is applied to all mass channels collected. The procedure involves concatenating all concurrently measured individual m/z chromatograms from m/z 20 to 120 for each GC-MS separation into a row vector. All of the sample row vectors are incorporated into a matrix where each row is a sample vector. This matrix is piecewise aligned and reduced by ANOVA feature selection. Application of the preprocessing steps (retention time alignment and feature selection) to all mass channels collected during the chromatographic separation allows considerably more selective chemical information to be incorporated in the PCA classification, and is the primary novelty of the report. This procedure is objective and requires no knowledge of the specific analytes of interest, as in selective ion monitoring (SIM), and does not restrict the mass spectral data used, as in both SIM and total ion current (TIC) methods. Significantly, the procedure allows for the classification of data with low resolution in the chromatographic dimension because of the added selectivity from the complete mass spectral dimension. This allows for the successful classification of data over significantly decreased chromatographic separation times, since high-speed separations can be employed. The procedure is demonstrated through the analysis of a set of four differing gasoline samples that serve as model complex samples. For comparison, the gasoline samples are analyzed by GC-MS over both ten-minute and ten-second separation times. The ten-minute GC-MS TIC data served as the benchmark analysis to compare to the ten-second data. When only alignment and feature selection was applied to the ten-second gasoline separations using GC-MS TIC data, PCA failed. PCA was successful for ten-second gasoline separations when the procedure was applied with all the m/z information. With ANOVA feature selection, chromatographic regions with Fisher Ratios greater than 1500 were retained in a new matrix and subjected to PCA yielding successful classification for the ten-second separations.},
doi = {10.1016/j.chroma.2006.06.087},
url = {https://www.osti.gov/biblio/893654}, journal = {Journal of Chromatography A, 1129(1):111-118},
number = ,
volume = ,
place = {United States},
year = {Fri Sep 01 00:00:00 EDT 2006},
month = {Fri Sep 01 00:00:00 EDT 2006}
}