skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Performance analysis and optimization for scalable deployment of deep learning models for country-scale settlement mapping on Titan supercomputer

Journal Article · · Concurrency and Computation. Practice and Experience
DOI:https://doi.org/10.1002/cpe.5305· OSTI ID:1511944

Summary This paper presents a scalable object detection workflow for detecting objects, such as settlements, from remotely sensed (RS) imagery. We have successfully deployed this workflow on Titan supercomputer and utilized it for the task of mapping human settlement at a country scale. The performance of various stages in the workflow was analyzed before making it operational. The workflow implemented various strategies to address issues such as suboptimal resource utilization and long‐tail effects due to unbalanced image workload, data loss due to runtime failures, and maximum wall‐time constraints imposed by Titan's job scheduling policy. A mean shift clustering–based static load balancing strategy was implemented, which partitions the image load such that each partition contained similar‐sized images. Furthermore, a checkpoint‐restart strategy was added in the workflow as a fault‐tolerance mechanism to prevent the data losses due to unforeseen runtime failures. The performance of the above‐mentioned strategies was observed in various scenarios, such as node failure, exceeding wall time, and successful completion. Using this workflow, we have processed an RS data set that has a spatial resolution of 0.31 m and is comprised of 685 675 km 2 of area of the Republic of Zambia in under six hours using 5426 nodes of the Titan supercomputer.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility (OLCF)
Sponsoring Organization:
USDOE Office of Science (SC), Basic Energy Sciences (BES). Scientific User Facilities Division
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1511944
Alternate ID(s):
OSTI ID: 1511755
Journal Information:
Concurrency and Computation. Practice and Experience, Vol. 31, Issue 20; ISSN 1532-0626
Publisher:
WileyCopyright Statement
Country of Publication:
United States
Language:
English
Citation Metrics:
Cited by: 6 works
Citation information provided by
Web of Science

References (36)

Parallel computation of PDFs on big spatial data using Spark journal February 2019
DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images conference June 2018
Simulation and big data challenges in tuning building energy models conference May 2013
Evolving Deep Networks Using HPC conference January 2017
Semantics and High Performance Computing Driven Approaches for Enhanced Exploitation of Earth Observation (EO) Data: State of the Art journal November 2017
Scheduling many-task workloads on supercomputers: Dealing with trailing tasks conference November 2010
Parallelizing maximum likelihood classification on computer cluster and graphics processing unit for supervised image classification journal November 2016
A Study of Complex Deep Learning Networks on High-Performance, Neuromorphic, and Quantum Computers
  • Potok, Thomas E.; Schuman, Catherine; Young, Steven
  • ACM Journal on Emerging Technologies in Computing Systems, Vol. 14, Issue 2 https://doi.org/10.1145/3178454
journal July 2018
High Performance Computing for Hyperspectral Remote Sensing journal September 2011
Exploiting Different Types of Parallelism in Distributed Analysis of Remote Sensing Data journal August 2017
FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters conference June 2016
Development of a graph-based approach for building detection journal January 1999
A first order approximation to the optimum checkpoint interval journal September 1974
Automatic Rooftop Extraction in Nadir Aerial Imagery of Suburban Regions Using Corners and Variational Level Set Evolution journal January 2013
Urban land-use mapping using a deep convolutional neural network with high spatial resolution multispectral remote sensing imagery journal September 2018
Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters journal January 2018
Automatic building footprint extraction from high-resolution satellite image using mathematical morphology journal December 2017
A Probabilistic Framework for Building Extraction From Airborne Color Image and DSM journal March 2017
Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts journal December 2013
A higher order estimate of the optimum checkpoint interval for restart dumps journal February 2006
Accelerating Big Data processing chain in Image Information Mining using a hybrid HPC approach
  • Kurte, Kuldeep R.; Bhangale, Ujwala M.; Durbha, Surya S.
  • IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium, 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) https://doi.org/10.1109/IGARSS.2016.7730981
conference July 2016
Mean shift: a robust approach toward feature space analysis journal May 2002
Multisource Remote Sensing Data Classification Based on Convolutional Neural Network journal February 2018
Deep Learning Earth Observation Classification Using ImageNet Pretrained Networks journal January 2016
Supercomputer assisted generation of machine learning agents for the calibration of building energy models
  • Sanyal, Jibonananda; New, Joshua; Edwards, Richard
  • Proceedings of the Conference on Extreme Science and Engineering Discovery Environment Gateway to Discovery - XSEDE '13 https://doi.org/10.1145/2484762.2484818
conference January 2013
An object-based convolutional neural network (OCNN) for urban land use classification journal October 2018
pipsCloud: High performance cloud computing for remote sensing big data management and processing journal January 2018
Shape-Based Building Detection in Visible Band Images Using Shadow Information journal March 2017
Evaluation of Automatic Building Detection Approaches Combining High Resolution Images and LiDAR Data journal June 2011
Automatic Building Segmentation of Aerial Imagery Using Multi-Constraint Fully Convolutional Networks journal March 2018
Google Earth Engine: Planetary-scale geospatial analysis for everyone journal December 2017
Implementation of the parallel mean shift-based image segmentation algorithm on a GPU cluster journal January 2018
CNTK: Microsoft's Open-Source Deep-Learning Toolkit conference January 2016
Titan - Early Experience with the Titan System at Oak Ridge National Laboratory
  • Bland, Buddy
  • 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis https://doi.org/10.1109/SC.Companion.2012.356
conference November 2012
Exploiting convolutional representations for multiscale human settlement detection: Preliminary results conference July 2017
Ecosystem state change in the Arabian Sea fuelled by the recent loss of snow over the Himalayan-Tibetan Plateau region journal May 2020

Similar Records

Resiliency in numerical algorithm design for extreme scale simulations
Journal Article · Fri Dec 10 00:00:00 EST 2021 · International Journal of High Performance Computing Applications · OSTI ID:1511944

Towards Low-Overhead Resilience for Data Parallel Deep Learning
Conference · Sat Jan 01 00:00:00 EST 2022 · OSTI ID:1511944

Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer
Conference · Thu Jan 01 00:00:00 EST 2015 · OSTI ID:1511944