skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information
  1. Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks

    This work develops new techniques within Horovod, a generic communication library supporting data parallel training across deep learning frameworks. In particular, we improve the Horovod control plane by implementing a new coordination scheme that takes advantage of the characteristics of the typical data parallel training paradigm, namely the repeated execution of collectives on the gradients of a fixed set of tensors. Using a caching strategy, we execute Horovod’s existing coordinator-worker logic only once during a typical training run, replacing it with a more efficient decentralized orchestration strategy using the cached data and a global intersection of a bitvector for themore » remaining training duration. Next, we introduce a feature for end users to explicitly group collective operations, enabling finer grained control over the communication buffer sizes. To evaluate our proposed strategies, we conduct experiments on a world-class supercomputer — Summit. We compare our proposals to Horovod’s original design and observe 2x performance improvement at a scale of 6000 GPUs; we also compare them against tf.distribute and torch.DDP and achieve 12% better and comparable performance, respectively, using up to 1536 GPUs; we compare our solution against BytePS in typical HPC settings and achieve about 20% better performance on a scale of 768 GPUs. Finally, we test our strategies on a scientific application (STEMDL) using up to 27,600 GPUs (the entire Summit) and show that we achieve a near-linear scaling of 0.93 with a sustained performance of 1.54 exaflops (with standard error +- 0.02) in FP16 precision.« less
  2. Deep Learning and Structural Imaging of Materials

    Deep learning has had a transformative effect on numerous domains and is actively utilized by many scientists in data-intensive fields such as high-energy physics and cosmology. Materials science, and in particular, the structural imaging of materials with electrons and X-rays are projected to enter the age of scientific data torrents, positioning them as new application spaces for modern artificial intelligence. In this contribution, we provide a synopsis on the foundations and latest progress in deep learning and present an in-depth application of utilizing modern deep artificial neural networks in scanning transmission electron microscopy to extract structural material properties. We usemore » this case study to expose the strengths of deep learning-based models and discuss their current limitations, in the process highlighting their potential use in other data-intensive structural imaging modalities.« less
  3. Smoky Mountain Data Challenge 2020: An Open Call to Solve Data Problems in the Areas of Neutron Science, Material Science, Urban Modeling and Dynamics, Geophysics, and Biomedical Informatics

    The 2020 Smoky Mountains Computational Sciences and Engineering Conference enlists research scientists from across Oak Ridge National Laboratory (ORNL) to be data sponsors and help create data analytics challenges for eminent data sets at the laboratory. This work describes the significance of each of the seven data sets and their as- sociated challenge questions. The challenge questions for each data set were required to cover multiple difficulty levels. An international call for participation was sent to students, and researchers asking them to form teams of up to four people to apply novel data analytics techniques to these data sets.
  4. Strategies to Deploy and Scale Deep Learning on the Summit Supercomputer

    The rapid growth and wide applicability of Deep Learning (DL) frameworks poses challenges to computing centers which need to deploy and support the software, and also to domain scientists who have to keep up with the system environment and scale up scientific exploration through DL. We offer recommendations for deploying and scaling DL frameworks on the Summit supercomputer, currently atop the Top500 list, at the Oak Ridge National Laboratory Leadership Computing Facility (OLCF). We discuss DL software deployment in the form of containers, and compare performance of native-built frameworks and containerized deployment. Software containers show no noticeable negative performance impactmore » and exhibit faster Python loading times and promise easier maintenance. To explore strategies for scaling up DL model training campaigns, we assess DL compute kernel performance, discuss and recommend I/O data formats and staging, and identify communication needs for scalable message exchange for DL runs at scale. We recommend that users take a step-wise tuning approach beginning with algorithmic kernel choice, node I/O configuration, and communications tuning as best-practice. We present baseline examples of scaling efficiency 87% for a DL run of ResNet50 running on 1024 nodes (6144 V100 GPUs).« less
  5. Towards Native Execution of Deep Learning on a Leadership-Class HPC System

    Large parallel machines generally offer the best parallel performance with "native execution" that is achieved using codes developed with the optimized compilers, communication libraries, and runtimes offered on the machines. In this paper, we report and analyze performance results from native execution of deep learning on a leadership-class high-performance computing (HPC) system. Using our new code called DeepEx, we present a study of the parallel speed up and convergence rates of learning achieved with native parallel execution. In the trade-off between computational parallelism and synchronized convergence, we first focus on maximizing parallelism while still obtaining convergence. Scaling results are reportedmore » from execution on up to 15,000 GPUs using two scientific data sets from atom microscopy and protein folding applications, and also using the popular ImageNet data set. In terms of the traditional measure of parallel speed up, excellent scaling is observed up to 12,000 GPUs. Additionally, accounting for convergence rates of deep learning accuracy or error, a deep learning-specific metric called "learning speed up" is also tracked. The performance results indicate the need to evaluate parallel deep learning execution in terms of learning speed up, and point to additional directions for improved exploitation of high-end HPC systems.« less
  6. Competing phases in epitaxial vanadium dioxide at nanoscale

    Phase competition in correlated oxides offers tantalizing opportunities as many intriguing physical phenomena occur near the phase transitions. Owing to a sharp metal-insulator transition (MIT) near room temperature, the correlated vanadium dioxide (VO2) exhibits a strong competition between insulating and metallic phases, which is important for practical applications. However, the phase boundary undergoes a strong modification when strain is involved, yielding complex phase transitions. Here, we report the emergence of nanoscale M2 phase domains in VO2 epitaxial films under anisotropic strain relaxation. The competing phases of the films are imaged by multilength-scale probes, detecting the structural and electrical properties inmore » individual local domains. Competing evolution of the M1 and M2 phases indicates the critical role of lattice-strain on both the stability of the M2 Mott phase and the energetics of the MIT in VO2 films. This study demonstrates how strain engineering can be utilized to design phase states, which allow deliberate control of MIT behavior at the nanoscale in epitaxial VO2 films.« less
  7. A Programming Framework for Neuromorphic Systems with Emerging Technologies

    Neuromorphic computing is a promising post-Moore's law era technology. A wide variety of neuromorphic computer (NC) architectures have emerged in recent years, ranging from traditional fully digital CMOS to nanoscale implementations with novel, beyond CMOS components. There are already major questions associated with how we are going to program and use NCs simply because of how radically different their architecture is as compared with the von Neumann architecture. When coupled with the implementations using emerging device technologies, which add additional issues associated with programming devices, it is clear that we must define a new way to program and develop formore » NC devices. In this work, we discuss a programming framework for NC devices implemented with emerging technologies. We discuss how we have applied this framework to program a NC system implemented with metal oxide memristors. We utilize the framework to develop two applications for the memristive NC device: a simple multiplexer and a simple control task (the cart-pole problem). Finally, we discuss how this framework can be extended to NC systems implemented with a variety of novel device components and materials.« less
  8. Optical creation of a supercrystal with three-dimensional nanoscale periodicity

    Stimulation with ultrafast light pulses can realize and manipulate states of matter with emergent structural, electronic and magnetic phenomena. However, these non-equilibrium phases are often transient and the challenge is to stabilize them as persistent states. Here, we show that atomic-scale PbTiO3/SrTiO3 superlattices, counterpoising strain and polarization states in alternate layers, are converted by sub-picosecond optical pulses to a supercrystal phase. This phase persists indefinitely under ambient conditions, has not been created via equilibrium routes, and can be erased by heating. X-ray scattering and microscopy show this unusual phase consists of a coherent three-dimensional structure with polar, strain and charge-orderingmore » periodicities of up to 30 nm. By adjusting only dielectric properties, the phase-field model describes this emergent phase as a photo-induced charge-stabilized supercrystal formed from a two-phase equilibrium state. Furthermore our results demonstrate opportunities for light-activated pathways to thermally inaccessible and emergent metastable states.« less
  9. Identifying local structural states in atomic imaging by computer vision

    Abstract The availability of atomically resolved imaging modalities enables an unprecedented view into the local structural states of materials, which manifest themselves by deviations from the fundamental assumptions of periodicity and symmetry. Consequently, approaches that aim to extract these local structural states from atomic imaging data with minimal assumptions regarding the average crystallographic configuration of a material are indispensable to advances in structural and chemical investigations of materials. Here, we present an approach to identify and classify local structural states that is rooted in computer vision. This approach introduces a definition of a structural state that is composed of bothmore » local and nonlocal information extracted from atomically resolved images, and is wholly untethered from the familiar concepts of symmetry and periodicity. Instead, this approach relies on computer vision techniques such as feature detection, and concepts such as scale invariance. We present the fundamental aspects of local structural state extraction and classification by application to simulated scanning transmission electron microscopy images, and analyze the robustness of this approach in the presence of common instrumental factors such as noise, limited spatial resolution, and weak contrast. Finally, we apply this computer vision-based approach for the unsupervised detection and classification of local structural states in an experimental electron micrograph of a complex oxides interface, and a scanning tunneling micrograph of a defect-engineered multilayer graphene surface.« less
  10. Imaging nanoscale lattice variations by machine learning of x-ray diffraction microscopy data

    In this paper, we present a novel methodology based on machine learning to extract lattice variations in crystalline materials, at the nanoscale, from an x-ray Bragg diffraction-based imaging technique. By employing a full-field microscopy setup, we capture real space images of materials, with imaging contrast determined solely by the x-ray diffracted signal. The data sets that emanate from this imaging technique are a hybrid of real space information (image spatial support) and reciprocal lattice space information (image contrast), and are intrinsically multidimensional (5D). By a judicious application of established unsupervised machine learning techniques and multivariate analysis to this multidimensional datamore » cube, we show how to extract features that can be ascribed physical interpretations in terms of common structural distortions, such as lattice tilts and dislocation arrays. Finally, we demonstrate this 'big data' approach to x-ray diffraction microscopy by identifying structural defects present in an epitaxial ferroelectric thin-film of lead zirconate titanate.« less
...

Search for:
All Records
Author / Contributor
0000000171004250

Refine by:
Resource Type
Availability
Publication Date
Author / Contributor
Research Organization