skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING

Abstract

Large spectroscopic surveys require automated methods of analysis. This paper explores the use of k-means clustering as a tool for automated unsupervised classification of massive stellar spectral catalogs. The classification criteria are defined by the data and the algorithm, with no prior physical framework. We work with a representative set of stellar spectra associated with the Sloan Digital Sky Survey (SDSS) SEGUE and SEGUE-2 programs, which consists of 173,390 spectra from 3800 to 9200 A sampled on 3849 wavelengths. We classify the original spectra as well as the spectra with the continuum removed. The second set only contains spectral lines, and it is less dependent on uncertainties of the flux calibration. The classification of the spectra with continuum renders 16 major classes. Roughly speaking, stars are split according to their colors, with enough finesse to distinguish dwarfs from giants of the same effective temperature, but with difficulties to separate stars with different metallicities. There are classes corresponding to particular MK types, intrinsically blue stars, dust-reddened, stellar systems, and also classes collecting faulty spectra. Overall, there is no one-to-one correspondence between the classes we derive and the MK types. The classification of spectra without continuum renders 13 classes, the color separationmore » is not so sharp, but it distinguishes stars of the same effective temperature and different metallicities. Some classes thus obtained present a fairly small range of physical parameters (200 K in effective temperature, 0.25 dex in surface gravity, and 0.35 dex in metallicity), so that the classification can be used to estimate the main physical parameters of some stars at a minimum computational cost. We also analyze the outliers of the classification. Most of them turn out to be failures of the reduction pipeline, but there are also high redshift QSOs, multiple stellar systems, dust-reddened stars, galaxies, and, finally, odd spectra whose nature we have not deciphered. The template spectra representative of the classes are publicly available in the online journal.« less

Authors:
;  [1]
  1. Instituto de Astrofisica de Canarias, E-38205 La Laguna, Tenerife (Spain)
Publication Date:
OSTI Identifier:
22167174
Resource Type:
Journal Article
Journal Name:
Astrophysical Journal
Additional Journal Information:
Journal Volume: 763; Journal Issue: 1; Other Information: Country of input: International Atomic Energy Agency (IAEA); Journal ID: ISSN 0004-637X
Country of Publication:
United States
Language:
English
Subject:
79 ASTROPHYSICS, COSMOLOGY AND ASTRONOMY; ALGORITHMS; ASTRONOMY; ASTROPHYSICS; CALIBRATION; CATALOGS; CLASSIFICATION; COLOR; COSMIC DUST; DATA ANALYSIS; EMISSION SPECTRA; GALAXIES; GRAVITATION; RED SHIFT

Citation Formats

Sanchez Almeida, J., and Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es. AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING. United States: N. p., 2013. Web. doi:10.1088/0004-637X/763/1/50.
Sanchez Almeida, J., & Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es. AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING. United States. doi:10.1088/0004-637X/763/1/50.
Sanchez Almeida, J., and Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es. Sun . "AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING". United States. doi:10.1088/0004-637X/763/1/50.
@article{osti_22167174,
title = {AUTOMATED UNSUPERVISED CLASSIFICATION OF THE SLOAN DIGITAL SKY SURVEY STELLAR SPECTRA USING k-MEANS CLUSTERING},
author = {Sanchez Almeida, J. and Allende Prieto, C., E-mail: jos@iac.es, E-mail: callende@iac.es},
abstractNote = {Large spectroscopic surveys require automated methods of analysis. This paper explores the use of k-means clustering as a tool for automated unsupervised classification of massive stellar spectral catalogs. The classification criteria are defined by the data and the algorithm, with no prior physical framework. We work with a representative set of stellar spectra associated with the Sloan Digital Sky Survey (SDSS) SEGUE and SEGUE-2 programs, which consists of 173,390 spectra from 3800 to 9200 A sampled on 3849 wavelengths. We classify the original spectra as well as the spectra with the continuum removed. The second set only contains spectral lines, and it is less dependent on uncertainties of the flux calibration. The classification of the spectra with continuum renders 16 major classes. Roughly speaking, stars are split according to their colors, with enough finesse to distinguish dwarfs from giants of the same effective temperature, but with difficulties to separate stars with different metallicities. There are classes corresponding to particular MK types, intrinsically blue stars, dust-reddened, stellar systems, and also classes collecting faulty spectra. Overall, there is no one-to-one correspondence between the classes we derive and the MK types. The classification of spectra without continuum renders 13 classes, the color separation is not so sharp, but it distinguishes stars of the same effective temperature and different metallicities. Some classes thus obtained present a fairly small range of physical parameters (200 K in effective temperature, 0.25 dex in surface gravity, and 0.35 dex in metallicity), so that the classification can be used to estimate the main physical parameters of some stars at a minimum computational cost. We also analyze the outliers of the classification. Most of them turn out to be failures of the reduction pipeline, but there are also high redshift QSOs, multiple stellar systems, dust-reddened stars, galaxies, and, finally, odd spectra whose nature we have not deciphered. The template spectra representative of the classes are publicly available in the online journal.},
doi = {10.1088/0004-637X/763/1/50},
journal = {Astrophysical Journal},
issn = {0004-637X},
number = 1,
volume = 763,
place = {United States},
year = {2013},
month = {1}
}