skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Scalable Gaussian Process Analysis Algorithm for Biomass Monitoring

Journal Article · · Statistical Analysis and Data Mining
DOI:https://doi.org/10.1002/sam.10129· OSTI ID:1024275

Biomass monitoring is vital for studying the carbon cycle of earth's ecosystem and has several significant implications, especially in the context of understanding climate change and its impacts. Recently, several change detection methods have been proposed to identify land cover changes in temporal profiles (time series) of vegetation collected using remote sensing instruments, but do not satisfy one or both of the two requirements of the biomass monitoring problem, i.e., {\em operating in online mode} and {\em handling periodic time series}. In this paper, we adapt Gaussian process regression to detect changes in such time series in an online fashion. While Gaussian process (GP) have been widely used as a kernel based learning method for regression and classification, their applicability to massive spatio-temporal data sets, such as remote sensing data, has been limited owing to the high computational costs involved. We focus on addressing the scalability issues associated with the proposed GP based change detection algorithm. This paper makes several significant contributions. First, we propose a GP based online time series change detection algorithm and demonstrate its effectiveness in detecting different types of changes in {\em Normalized Difference Vegetation Index} (NDVI) data obtained from a study area in Iowa, USA. Second, we propose an efficient Toeplitz matrix based solution which significantly improves the computational complexity and memory requirements of the proposed GP based method. Specifically, the proposed solution can analyze a time series of length $$t$$ in $O(t^2)$ time while maintaining a $O(t)$ memory footprint, compared to the $O(t^3)$ time and $O(t^2)$ memory requirement of standard matrix manipulation based methods. Third, we describe a parallel version of the proposed solution which can be used to simultaneously analyze a large number of time series. We study three different parallel implementations: using threads, MPI, and a hybrid implementation using threads and MPI. Experimental results show that the hybrid implementation scales better than the multi-threaded and MPI based implementations. The application of the proposed scalable algorithm is demonstrated in analyzing massive remote sensing observation data. The hybrid implementation, using 1536 computing cores, can analyze an NDVI data set for the Iowa study area in nearly 5 seconds, while a serial algorithm, using standard Cholesky decomposition based routines, takes several days to process the same data set.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Laboratory Directed Research and Development (LDRD) Program
DOE Contract Number:
DE-AC05-00OR22725
OSTI ID:
1024275
Journal Information:
Statistical Analysis and Data Mining, Vol. 4, Issue 4; ISSN 1932-1864
Country of Publication:
United States
Language:
English