skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit

Abstract

Architectural and hyper-parameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyper-parameters before discovering that distorted training data was impeding training progress. We found that an evolutionary algorithm (EA) can be used to diagnose this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that Contrast Limited Adaptive Histogram Equalization (CLAHE) enhancement that was applied to previously generated digital surface models (DSMs), for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL Intersection Over Union (IOU) still exhibited consistent sub-par performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement, and allowed for a more timely convergence on an ultimately viable solution.

Authors:
ORCiD logo [1]; ORCiD logo [2]; ORCiD logo [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
  2. Rochester Inst. of Technology, Rochester, NY (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1615814
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
IBM Journal of Research and Development
Additional Journal Information:
Journal Volume: 64; Journal Issue: 3/4; Journal ID: ISSN 0018-8646
Publisher:
IEEE
Country of Publication:
United States
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Coletti, Mark A., Fafard, Alex, and Page, David. Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit. United States: N. p., 2019. Web. https://doi.org/10.1147/JRD.2019.2960225.
Coletti, Mark A., Fafard, Alex, & Page, David. Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit. United States. https://doi.org/10.1147/JRD.2019.2960225
Coletti, Mark A., Fafard, Alex, and Page, David. Tue . "Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit". United States. https://doi.org/10.1147/JRD.2019.2960225. https://www.osti.gov/servlets/purl/1615814.
@article{osti_1615814,
title = {Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit},
author = {Coletti, Mark A. and Fafard, Alex and Page, David},
abstractNote = {Architectural and hyper-parameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyper-parameters before discovering that distorted training data was impeding training progress. We found that an evolutionary algorithm (EA) can be used to diagnose this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that Contrast Limited Adaptive Histogram Equalization (CLAHE) enhancement that was applied to previously generated digital surface models (DSMs), for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL Intersection Over Union (IOU) still exhibited consistent sub-par performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement, and allowed for a more timely convergence on an ultimately viable solution.},
doi = {10.1147/JRD.2019.2960225},
journal = {IBM Journal of Research and Development},
number = 3/4,
volume = 64,
place = {United States},
year = {2019},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Save / Share: