skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics

Abstract

The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute-intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advancing machine learning to compute with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amounts of data into homogeneous distributions for fitting simple models. RESFlow takes advantage of Apache Spark and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment in both computationally and data-intensive workloads for pixel-level labeling tasks. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute-intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. To address the problem of hardware resource contention, our containerized workflow further incorporates a novel GPU checkout routine and the ticketing system across multiple workers.more » The workflow is demonstrated with NVIDIA DGX accelerated platforms and offers appreciable compute speed-ups for deep learning inference on pixel labeling workloads; processing 21 028 TB of imagery data and delivering output maps at area rate of 5.245 sq.km/s, amounting to 453 168 sq.km/day—reducing a 28 day workload to 21 h.« less

Authors:
ORCiD logo [1]; ORCiD logo [1]; ORCiD logo [1];  [1]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1607168
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Additional Journal Information:
Journal Volume: 13; Journal ID: ISSN 1939-1404
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
47 OTHER INSTRUMENTATION; Remote sensing; Labeling; Deep learning; Task analysis; Cluster computing; Satellites; Big data applications; High performance computing; Image classification; Inference mechanisms; Machine learning; Supervised learning

Citation Formats

Lunga, Dalton D., Gerrand, Jonathan D., Yang, Lexie, Layton, Christopher, and Stewart, Robert. Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics. United States: N. p., 2020. Web. https://doi.org/10.1109/JSTARS.2019.2959707.
Lunga, Dalton D., Gerrand, Jonathan D., Yang, Lexie, Layton, Christopher, & Stewart, Robert. Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics. United States. https://doi.org/10.1109/JSTARS.2019.2959707
Lunga, Dalton D., Gerrand, Jonathan D., Yang, Lexie, Layton, Christopher, and Stewart, Robert. Thu . "Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics". United States. https://doi.org/10.1109/JSTARS.2019.2959707. https://www.osti.gov/servlets/purl/1607168.
@article{osti_1607168,
title = {Apache Spark Accelerated Deep Learning Inference for Large Scale Satellite Image Analytics},
author = {Lunga, Dalton D. and Gerrand, Jonathan D. and Yang, Lexie and Layton, Christopher and Stewart, Robert},
abstractNote = {The shear volumes of data generated from earth observation and remote sensing technologies continue to make major impact; leaping key geospatial applications into the dual data and compute-intensive era. As a consequence, this rapid advancement poses new computational and data processing challenges. We implement a novel remote sensing data flow (RESFlow) for advancing machine learning to compute with massive amounts of remotely sensed imagery. The core contribution is partitioning massive amounts of data into homogeneous distributions for fitting simple models. RESFlow takes advantage of Apache Spark and the availability of modern computing hardware to harness the acceleration of deep learning inference on expansive remote sensing imagery. The framework incorporates a strategy to optimize resource utilization across multiple executors assigned to a single worker. We showcase its deployment in both computationally and data-intensive workloads for pixel-level labeling tasks. The pipeline invokes deep learning inference at three stages; during deep feature extraction, deep metric mapping, and deep semantic segmentation. The tasks impose compute-intensive and GPU resource sharing challenges motivating for a parallelized pipeline for all execution steps. To address the problem of hardware resource contention, our containerized workflow further incorporates a novel GPU checkout routine and the ticketing system across multiple workers. The workflow is demonstrated with NVIDIA DGX accelerated platforms and offers appreciable compute speed-ups for deep learning inference on pixel labeling workloads; processing 21 028 TB of imagery data and delivering output maps at area rate of 5.245 sq.km/s, amounting to 453 168 sq.km/day—reducing a 28 day workload to 21 h.},
doi = {10.1109/JSTARS.2019.2959707},
journal = {IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
number = ,
volume = 13,
place = {United States},
year = {2020},
month = {1}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 1 work
Citation information provided by
Web of Science

Save / Share: