Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit

Coletti, Mark A.; Fafard, Alex; Page, David

doi:10.1147/JRD.2019.2960225

Title: Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit

Journal Article · Tue Dec 17 00:00:00 EST 2019 · IBM Journal of Research and Development

DOI:https://doi.org/10.1147/JRD.2019.2960225· OSTI ID:1615814

^[1];

^[2];

^[1]

Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Rochester Inst. of Technology, Rochester, NY (United States)

Architectural and hyper-parameter design choices can influence deep-learner (DL) model fidelity but can also be affected by malformed training and validation data. However, practitioners may spend significant time refining layers and hyper-parameters before discovering that distorted training data was impeding training progress. We found that an evolutionary algorithm (EA) can be used to diagnose this kind of DL problem. An EA evaluated thousands of DL configurations on Summit that yielded no overall improvement in DL performance, which suggested problems with the training and validation data. We suspected that Contrast Limited Adaptive Histogram Equalization (CLAHE) enhancement that was applied to previously generated digital surface models (DSMs), for which we were training DLs to find errors, had damaged the training data. Subsequent runs with an alternative global normalization yielded significantly improved DL performance. However, the DL Intersection Over Union (IOU) still exhibited consistent sub-par performance, which suggested further problems with the training data and DL approach. Nonetheless, we were able to diagnose this problem within a 12-hour span via Summit runs, which prevented several weeks of unproductive trial-and-error DL configuration refinement, and allowed for a more timely convergence on an ultimately viable solution.

View Accepted Manuscript (DOE)

Cite

Export

Save

Research Organization:: Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

Grant/Contract Number:: AC05-00OR22725

OSTI ID:: 1615814

Journal Information:: IBM Journal of Research and Development, Vol. 64, Issue 3/4; ISSN 0018-8646

Publisher:: IEEECopyright Statement

Country of Publication:: United States

Language:: English

Similar Records

Evolving Larger Convolutional Layer Kernel Sizes for a Settlement Detection Deep-Learner on Summit

Conference · Fri Nov 01 00:00:00 EDT 2019 · OSTI ID:1615814

Coletti, Mark; Lunga, Dalton; Bassett, Jeffrey K.; +1 more

Strategies to Deploy and Scale Deep Learning on the Summit Supercomputer

Conference · Fri Nov 01 00:00:00 EDT 2019 · OSTI ID:1615814

Yin, Junqi; Gahlot, Shubhankar; Laanait, Nouamane; +4 more

Fast and Accurate Predictions of Total Energy for Solid Solution Alloys with Graph Convolutional Neural Networks

Conference · Tue Mar 01 00:00:00 EST 2022 · OSTI ID:1615814

Lupo Pasini, Massimiliano; Burcul, Marco; Reeve, Sam; +2 more

Related Subjects

97 MATHEMATICS AND COMPUTING

Title: Troubleshooting deep-learner training data problems using an evolutionary algorithm on Summit

Citation Formats

Similar Records

Related Subjects