skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information
  1. Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

    The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
  2. Saturation of Alfvén modes in tokamaks

    Here, the growth of Alfvén modes driven unstable by a distribution of high energy particles up to saturation is investigated with a guiding center code, using numerical eigenfunctions produced by linear theory and a numerical high energy particle distribution, in order to make detailed comparison with experiment and with models for saturation amplitudes and the modification of beam profiles. Two innovations are introduced. First, a very noise free means of obtaining the mode-particle energy and momentum transfer is introduced, and secondly, a spline representation of the actual beam particle distribution is used.
  3. Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide

    The goal of the extreme scale plasma turbulence studies described in this paper is to expedite the delivery of reliable predictions on confinement physics in large magnetic fusion systems by using world-class supercomputers to carry out simulations with unprecedented resolution and temporal duration. This has involved architecture-dependent optimizations of performance scaling and addressing code portability and energy issues, with the metrics for multi-platform comparisons being 'time-to-solution' and 'energy-to-solution'. Realistic results addressing how confinement losses caused by plasma turbulence scale from present-day devices to the much larger $25 billion international ITER fusion facility have been enabled by innovative advances in themore » GTC-P code including (i) implementation of one-sided communication from MPI 3.0 standard; (ii) creative optimization techniques on Xeon Phi processors; and (iii) development of a novel performance model for the key kernels of the PIC code. Our results show that modeling data movement is sufficient to predict performance on modern supercomputer platforms.« less
  4. The fusion code XGC: Enabling kinetic study of multi-scale edge turbulent transport in ITER

    The XGC fusion gyrokinetic code combines state-of-the-art, portable computational and algorithmic technologies to enable complicated multiscale simulations of turbulence and transport dynamics in ITER edge plasma on the largest US open-science computer, the CRAY XK7 Titan, at its maximal heterogeneous capability, which have not been possible before due to a factor of over 10 shortage in the time-to-solution for less than 5 days of wall-clock time for one physics case. Frontier techniques such as nested OpenMP parallelism, adaptive parallel I/O, staging I/O and data reduction using dynamic and asynchronous applications interactions, dynamic repartitioning.
  5. Petascale Parallelization of the Gyrokinetic Toroidal Code

    The Gyrokinetic Toroidal Code (GTC) is a global, three-dimensional particle-in-cell application developed to study microturbulence in tokamak fusion devices. The global capability of GTC is unique, allowing researchers to systematically analyze important dynamics such as turbulence spreading. In this work we examine a new radial domain decomposition approach to allow scalability onto the latest generation of petascale systems. Extensive performance evaluation is conducted on three high performance computing systems: the IBM BG/P, the Cray XT4, and an Intel Xeon Cluster. Overall results show that the radial decomposition approach dramatically increases scalability, while reducing the memory footprint - allowing for fusionmore » device simulations at an unprecedented scale. After a decade where high-end computing (HEC) was dominated by the rapid pace of improvements to processor frequencies, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs is a key step towards making effective petascale computing a reality. In this work, we examine a new parallelization scheme for the Gyrokinetic Toroidal Code (GTC) [?] micro-turbulence fusion application. Extensive scalability results and analysis are presented on three HEC systems: the IBM BlueGene/P (BG/P) at Argonne National Laboratory, the Cray XT4 at Lawrence Berkeley National Laboratory, and an Intel Xeon cluster at Lawrence Livermore National Laboratory. Overall results indicate that the new radial decomposition approach successfully attains unprecedented scalability to 131,072 BG/P cores by overcoming the memory limitations of the previous approach. The new version is well suited to utilize emerging petascale resources to access new regimes of physical phenomena.« less
  6. Fusion Energy Sciences Exascale Requirements Review. An Office of Science review sponsored jointly by Advanced Scientific Computing Research and Fusion Energy Sciences, January 27-29, 2016, Gaithersburg, Maryland

    The additional computing power offered by the planned exascale facilities could be transformational across the spectrum of plasma and fusion research — provided that the new architectures can be efficiently applied to our problem space. The collaboration that will be required to succeed should be viewed as an opportunity to identify and exploit cross-disciplinary synergies. To assess the opportunities and requirements as part of the development of an overall strategy for computing in the exascale era, the Exascale Requirements Review meeting of the Fusion Energy Sciences (FES) community was convened January 27–29, 2016, with participation from a broad range ofmore » fusion and plasma scientists, specialists in applied mathematics and computer science, and representatives from the U.S. Department of Energy (DOE) and its major computing facilities. This report is a summary of that meeting and the preparatory activities for it and includes a wealth of detail to support the findings. Technical opportunities, requirements, and challenges are detailed in this report (and in the recent report on the Workshop on Integrated Simulation). Science applications are described, along with mathematical and computational enabling technologies. Also see for more information.« less
  7. Finding Regions of Interest on Toroidal Meshes

    Fusion promises to provide clean and safe energy, and a considerable amount of research effort is underway to turn this aspiration intoreality. This work focuses on a building block for analyzing data produced from the simulation of microturbulence in magnetic confinementfusion devices: the task of efficiently extracting regions of interest. Like many other simulations where a large amount of data are produced,the careful study of ``interesting'' parts of the data is critical to gain understanding. In this paper, we present an efficient approach forfinding these regions of interest. Our approach takes full advantage of the underlying mesh structure in magneticmore » coordinates to produce acompact representation of the mesh points inside the regions and an efficient connected component labeling algorithm for constructingregions from points. This approach scales linearly with the surface area of the regions of interest instead of the volume as shown with bothcomputational complexity analysis and experimental measurements. Furthermore, this new approach is 100s of times faster than a recentlypublished method based on Cartesian coordinates.« less
  8. Scientific Application Performance on Leading Scalar and VectorSupercomputing Platforms

    The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end computing (HEC) platforms, primarily because of their generality, scalability, and cost effectiveness. However, the growing gap between sustained and peak performance for full-scale scientific applications on conventional supercomputers has become a major concern in high performance computing, requiring significantly larger systems and application scalability than implied by peak performance in order to achieve desired performance. The latest generation of custom-built parallel vector systems have the potential to address this issue for numerical algorithms with sufficient regularity in their computational structure. In this work we exploremore » applications drawn from four areas: magnetic fusion (GTC), plasma physics (LBMHD3D), astrophysics (Cactus), and material science (PARATEC). We compare performance of the vector-based Cray X1, X1E, Earth Simulator, NEC SX-8, with performance of three leading commodity-based superscalar platforms utilizing the IBM Power3, Intel Itanium2, and AMD Opteron processors. Our work makes several significant contributions: a new data-decomposition scheme for GTC that (for the first time) enables a breakthrough of the Teraflop barrier; the introduction of a new three-dimensional Lattice Boltzmann magneto-hydrodynamic implementation used to study the onset evolution of plasma turbulence that achieves over 26Tflop/s on 4800 ES processors; the highest per processor performance (by far) achieved by the full-production version of the Cactus ADM-BSSN; and the largest PARATEC cell size atomistic simulation to date. Overall, results show that the vector architectures attain unprecedented aggregate performance across our application suite, demonstrating the tremendous potential of modern parallel vector systems.« less
  9. S-Preconditioner for Multi-fold Data Reduction with Guaranteed User-Controlled Accuracy

    The growing gap between the massive amounts of data generated by petascale scientific simulation codes and the capability of system hardware and software to effectively analyze this data necessitates data reduction. Yet, the increasing data complexity challenges most, if not all, of the existing data compression methods. In fact, lossless compression techniques offer no more than 10% reduction on scientific data that we have experience with, which is widely regarded as effectively incompressible. To bridge this gap, in this paper, we advocate a transformative strategy that enables fast, accurate, and multi-fold reduction of double-precision floating-point scientific data. The intuition behindmore » our method is inspired by an effective use of preconditioners for linear algebra solvers optimized for a particular class of computational dwarfs (e.g., dense or sparse matrices). Focusing on a commonly used multi-resolution wavelet compression technique as the underlying solver for data reduction we propose the S-preconditioner, which transforms scientific data into a form with high global regularity to ensure a significant decrease in the number of wavelet coefficients stored for a segment of data. Combined with the subsequent EQ-calibrator, our resultant method (called S-Preconditioned EQ-Calibrated Wavelets (SPEQC-WAVELETS)), robustly achieved a 4- to 5- fold data reduction while guaranteeing user-defined accuracy of reconstructed data to be within 1% point-by-point relative error, lower than 0:01 Normalized RMSE, and higher than 0:99 Pearson Correlation. In this paper, we show the results we obtained by testing our method on six petascale simulation codes including fusion, combustion, climate, astrophysics, and subsurface groundwater in addition to 13 publicly available scientific datasets. We also demonstrate that application-driven data mining tasks performed on decompressed variables or their derived quantities produce results of comparable quality with the ones for the original data.« less
  10. Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms

    The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This paper presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Finally, our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.

Search for:
All Records
Creator / Author
"Ethier, Stephane"

Refine by:
Resource Type
Publication Date
Creator / Author
Research Organization