Supervised detection of anomalous light curves in massive astronomical catalogs

Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

doi:10.1088/0004-637X/793/1/23

Title: Supervised detection of anomalous light curves in massive astronomical catalogs

Journal Article · Sat Sep 20 00:00:00 EDT 2014 · Astrophysical Journal

DOI:https://doi.org/10.1088/0004-637X/793/1/23· OSTI ID:22365024

Nun, Isadora; Pichara, Karim ^[1]; Protopapas, Pavlos ^[2]; Kim, Dae-Won ^[3]

Computer Science Department, Pontificia Universidad Católica de Chile, Santiago (Chile)
Institute for Applied Computational Science, Harvard University, Cambridge, MA (United States)
Max-Planck Institute for Astronomy, Königstuhl 17, D-69117 Heidelberg (Germany)

The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.

Cite

Export

Save

OSTI ID:: 22365024

Journal Information:: Astrophysical Journal, Vol. 793, Issue 1; Other Information: Country of input: International Atomic Energy Agency (IAEA); ISSN 0004-637X

Country of Publication:: United States

Language:: English

Similar Records

Automatic classification of time-variable X-ray sources

Journal Article · Thu May 01 00:00:00 EDT 2014 · Astrophysical Journal · OSTI ID:22365024

Lo, Kitty K.; Farrell, Sean; Murphy, Tara; +1 more

AUTOCLASSIFICATION OF THE VARIABLE 3XMM SOURCES USING THE RANDOM FOREST MACHINE LEARNING ALGORITHM

Journal Article · Sun Nov 01 00:00:00 EDT 2015 · Astrophysical Journal · OSTI ID:22365024

Farrell, Sean A.; Murphy, Tara

The LSST AGN Data Challenge: Selection Methods

Journal Article · Fri Aug 11 00:00:00 EDT 2023 · The Astrophysical Journal · OSTI ID:22365024

Savić, Đorđe V.; Jankov, Isidora; Yu, Weixiang; +12 more

Related Subjects

79 ASTROPHYSICS
COSMOLOGY AND ASTRONOMY
ASTROPHYSICS
CALIBRATION
CATALOGS
CEPHEIDS
COSMIC X-RAY SOURCES
DATA ANALYSIS
DETECTION
DISTRIBUTION
ECLIPSE
ERRORS
MASS
PROBABILITY
RANDOMNESS
RESOURCES
SEASONAL VARIATIONS
SPACE
VISIBLE RADIATION
X-RAY SOURCES

Title: Supervised detection of anomalous light curves in massive astronomical catalogs

Citation Formats

Similar Records

Related Subjects