DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data

Abstract

Abstract While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield terabyte-scale volumes of data that can impose a heavy burden on the computing platform. Although graphics processing units (GPUs) provide high performance for such large-scale ptychography datasets, a single GPU is typically insufficient for analysis and reconstruction. Several works have considered leveraging multiple GPUs to accelerate the ptychographic reconstruction. However, most of these works utilize only the Message Passing Interface to handle the communications between GPUs. This approach poses inefficiency for a hardware configuration that has multiple GPUs in a single node, especially while reconstructing a single large projection, since it provides no optimizations to handle the heterogeneous GPU interconnections containing both low-speed (e.g., PCIe) and high-speed links (e.g., NVLink). In this paper, we provide an optimized intranode multi-GPU implementation that can efficiently solve large-scale ptychographic reconstruction problems. We focus on the maximum likelihood reconstruction problem using a conjugate gradient (CG) method for the solution and propose a novel hybrid parallelization model to address the performance bottlenecks in the CG solver. Accordingly, we have developed a tool, called PtyGermore » ( Pty chographic G PU(multipl e )-based r econstruction), implementing our hybrid parallelization model design. A comprehensive evaluation verifies that PtyGer can fully preserve the original algorithm’s accuracy while achieving outstanding intranode GPU scalability.« less

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Laboratory (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); USDOE Office of Science (SC), Basic Energy Sciences (BES); USDOE National Nuclear Security Administration (NNSA); US Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA)
OSTI Identifier:
1860447
Alternate Identifier(s):
OSTI ID: 1901717
Grant/Contract Number:  
AC02-06CH11357; 89233218CNA000001; D2019-1903270004
Resource Type:
Published Article
Journal Name:
Scientific Reports
Additional Journal Information:
Journal Name: Scientific Reports Journal Volume: 12 Journal Issue: 1; Journal ID: ISSN 2045-2322
Publisher:
Nature Publishing Group
Country of Publication:
United Kingdom
Language:
English
Subject:
97 MATHEMATICS AND COMPUTING

Citation Formats

Yu, Xiaodong, Nikitin, Viktor, Ching, Daniel J., Aslan, Selin, Gürsoy, Doğa, and Biçer, Tekin. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. United Kingdom: N. p., 2022. Web. doi:10.1038/s41598-022-09430-3.
Yu, Xiaodong, Nikitin, Viktor, Ching, Daniel J., Aslan, Selin, Gürsoy, Doğa, & Biçer, Tekin. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. United Kingdom. https://doi.org/10.1038/s41598-022-09430-3
Yu, Xiaodong, Nikitin, Viktor, Ching, Daniel J., Aslan, Selin, Gürsoy, Doğa, and Biçer, Tekin. Tue . "Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data". United Kingdom. https://doi.org/10.1038/s41598-022-09430-3.
@article{osti_1860447,
title = {Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data},
author = {Yu, Xiaodong and Nikitin, Viktor and Ching, Daniel J. and Aslan, Selin and Gürsoy, Doğa and Biçer, Tekin},
abstractNote = {Abstract While the advances in synchrotron light sources, together with the development of focusing optics and detectors, allow nanoscale ptychographic imaging of materials and biological specimens, the corresponding experiments can yield terabyte-scale volumes of data that can impose a heavy burden on the computing platform. Although graphics processing units (GPUs) provide high performance for such large-scale ptychography datasets, a single GPU is typically insufficient for analysis and reconstruction. Several works have considered leveraging multiple GPUs to accelerate the ptychographic reconstruction. However, most of these works utilize only the Message Passing Interface to handle the communications between GPUs. This approach poses inefficiency for a hardware configuration that has multiple GPUs in a single node, especially while reconstructing a single large projection, since it provides no optimizations to handle the heterogeneous GPU interconnections containing both low-speed (e.g., PCIe) and high-speed links (e.g., NVLink). In this paper, we provide an optimized intranode multi-GPU implementation that can efficiently solve large-scale ptychographic reconstruction problems. We focus on the maximum likelihood reconstruction problem using a conjugate gradient (CG) method for the solution and propose a novel hybrid parallelization model to address the performance bottlenecks in the CG solver. Accordingly, we have developed a tool, called PtyGer ( Pty chographic G PU(multipl e )-based r econstruction), implementing our hybrid parallelization model design. A comprehensive evaluation verifies that PtyGer can fully preserve the original algorithm’s accuracy while achieving outstanding intranode GPU scalability.},
doi = {10.1038/s41598-022-09430-3},
journal = {Scientific Reports},
number = 1,
volume = 12,
place = {United Kingdom},
year = {Tue Mar 29 00:00:00 EDT 2022},
month = {Tue Mar 29 00:00:00 EDT 2022}
}

Works referenced in this record:

Phase retrieval with transverse translation diversity: a nonlinear optimization approach
journal, January 2008

  • Guizar-Sicairos, Manuel; Fienup, James R.
  • Optics Express, Vol. 16, Issue 10
  • DOI: 10.1364/OE.16.007264

Parallel ptychographic reconstruction
journal, January 2014

  • Nashed, Youssef S. G.; Vine, David J.; Peterka, Tom
  • Optics Express, Vol. 22, Issue 26
  • DOI: 10.1364/OE.22.032082

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters
conference, January 2017

  • Awan, Ammar Ahmad; Hamidouche, Khaled; Hashmi, Jahanzeb Maqbool
  • PPoPP '17: 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
  • DOI: 10.1145/3018743.3018769

BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
conference, June 2016

  • Wang, Linnan; Wu, Wei; Xu, Zenglin
  • ICS '16: 2016 International Conference on Supercomputing, Proceedings of the 2016 International Conference on Supercomputing
  • DOI: 10.1145/2925426.2926256

Maximum-likelihood refinement for coherent diffractive imaging
journal, June 2012


AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-Based Multi-and Many-Core Processors
conference, May 2016

  • Hou, Kaixi; Wang, Hao; Feng, Wu-Chun
  • 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2016.115

XDesign : an open-source software package for designing X-ray imaging phantoms and experiments
journal, February 2017


Beyond crystallography: Diffractive imaging using coherent x-ray light sources
journal, April 2015


Movable Aperture Lensless Transmission Microscopy: A Novel Phase Retrieval Algorithm
journal, July 2004


The conjugate gradient method in extremal problems
journal, January 1969


Multi-GPU Graph Analytics
conference, May 2017

  • Pan, Yuechao; Wang, Yangzihao; Wu, Yuduo
  • 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS.2017.117

Transmission microscopy without lenses for objects of unlimited size
journal, February 2007


High-throughput ptychography using Eiger-scanning X-ray nano-imaging of extended regions
journal, January 2014

  • Guizar-Sicairos, Manuel; Johnson, Ian; Diaz, Ana
  • Optics Express, Vol. 22, Issue 12
  • DOI: 10.1364/OE.22.014859

Topology-aware optimizations for multi-GPU ptychographic image reconstruction
conference, June 2021

  • Yu, Xiaodong; Bicer, Tekin; Kettimuthu, Rajkumar
  • ICS '21: 2021 International Conference on Supercomputing, Proceedings of the ACM International Conference on Supercomputing
  • DOI: 10.1145/3447818.3460380

High-Resolution Scanning X-ray Diffraction Microscopy
journal, July 2008


The Velociprobe: An ultrafast hard X-ray nanoprobe for high-resolution ptychographic imaging
journal, August 2019

  • Deng, Junjing; Preissner, Curt; Klug, Jeffrey A.
  • Review of Scientific Instruments, Vol. 90, Issue 8
  • DOI: 10.1063/1.5103173

Image Quality Assessment: From Error Visibility to Structural Similarity
journal, April 2004

  • Wang, Z.; Bovik, A. C.; Sheikh, H. R.
  • IEEE Transactions on Image Processing, Vol. 13, Issue 4
  • DOI: 10.1109/TIP.2003.819861

Relaxed averaged alternating reflections for diffraction imaging
journal, November 2004


Keyhole coherent diffractive imaging
journal, March 2008

  • Abbey, Brian; Nugent, Keith A.; Williams, Garth J.
  • Nature Physics, Vol. 4, Issue 5
  • DOI: 10.1038/nphys896

GPU acceleration of regular expression matching for large datasets: exploring the implementation space
conference, January 2013

  • Yu, Xiaodong; Becchi, Michela
  • Proceedings of the ACM International Conference on Computing Frontiers - CF '13
  • DOI: 10.1145/2482767.2482791

X-ray ptychography
journal, December 2017


Stepping up to Summit
journal, March 2018


Probe retrieval in ptychographic coherent diffractive imaging
journal, March 2009


Optimization of Collective Communication Operations in MPICH
journal, February 2005

  • Thakur, Rajeev; Rabenseifner, Rolf; Gropp, William
  • The International Journal of High Performance Computing Applications, Vol. 19, Issue 1
  • DOI: 10.1177/1094342005051521

High-Performance Multi-Mode Ptychography Reconstruction on Distributed GPUs
conference, August 2018


Rotation-as-fast-axis scanning-probe x-ray tomography: the importance of angular diversity for fly-scan modes
journal, January 2018

  • Ching, Daniel J.; Hidayetoğlu, Mert; Biçer, Tekin
  • Applied Optics, Vol. 57, Issue 30
  • DOI: 10.1364/AO.57.008780

A phase retrieval algorithm for shifting illumination
journal, November 2004

  • Rodenburg, J. M.; Faulkner, H. M. L.
  • Applied Physics Letters, Vol. 85, Issue 20
  • DOI: 10.1063/1.1823034

An optimum demodulator for poisson processes: Photon source detectors
journal, January 1963


cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs
conference, May 2016

  • Yu, Xiaodong; Wang, Hao; Feng, Wu-Chun
  • 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
  • DOI: 10.1109/CCGrid.2016.96

Joint ptycho-tomography reconstruction through alternating direction method of multipliers
journal, January 2019

  • Aslan, Selin; Nikitin, Viktor; Ching, Daniel J.
  • Optics Express, Vol. 27, Issue 6
  • DOI: 10.1364/OE.27.009128

An improved ptychographical phase retrieval algorithm for diffractive imaging
journal, September 2009


PtychoShelves , a versatile high-level framework for high-performance analysis of ptychographic data
journal, March 2020

  • Wakonig, Klaus; Stadler, Hans-Christian; Odstrčil, Michal
  • Journal of Applied Crystallography, Vol. 53, Issue 2
  • DOI: 10.1107/S1600576720001776

Iterative least-squares solver for generalized maximum-likelihood ptychography
journal, January 2018

  • Odstrčil, Michal; Menzel, Andreas; Guizar-Sicairos, Manuel
  • Optics Express, Vol. 26, Issue 3
  • DOI: 10.1364/OE.26.003108

Ptychography at the Linac Coherent Light Source in a parasitic geometry
journal, September 2020

  • Pound, Benjamin A.; Mertes, Kevin M.; Carr, Adra V.
  • Journal of Applied Crystallography, Vol. 53, Issue 5
  • DOI: 10.1107/S1600576720010778

An Enhanced Image Reconstruction Tool for Computed Tomography on GPUs
conference, May 2017

  • Yu, Xiaodong; Wang, Hao; Feng, Wu-chun
  • CF '17: Computing Frontiers Conference, Proceedings of the Computing Frontiers Conference
  • DOI: 10.1145/3075564.3078889

Convergence Properties of Nonlinear Conjugate Gradient Methods
journal, January 2000


Memory access patterns: the missing piece of the multi-GPU puzzle
conference, November 2015

  • Ben-Nun, Tal; Levy, Ely; Barak, Amnon
  • SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/2807591.2807611

Exploring different automata representations for efficient regular expression matching on GPUs
journal, August 2013


Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs
conference, November 2019

  • Gayatri, Rahulkumar; Gott, Kevin; Deslippe, Jack
  • 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)
  • DOI: 10.1109/PMBS49563.2019.00010

cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU
journal, July 2017

  • Zhang, Jing; Wang, Hao; Feng, Wu-chun
  • IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 14, Issue 4
  • DOI: 10.1109/TCBB.2015.2489662

Ptychopy: GPU framework for ptychographic data analysis
conference, September 2021

  • Yue, Ke; Deng, Junjing; Jiang, Yi
  • X-Ray Nanoimaging: Instruments and Methods V
  • DOI: 10.1117/12.2594281

Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
journal, January 2020

  • Li, Ang; Song, Shuaiwen Leon; Chen, Jieyang
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 31, Issue 1
  • DOI: 10.1109/TPDS.2019.2928289

Ptychography & lensless X-ray imaging
journal, January 2008

  • Dierolf, Martin; Bunk, Oliver; Kynde, Søren
  • Europhysics News, Vol. 39, Issue 1
  • DOI: 10.1051/epn:2008003

Coherent X-Ray Diffraction Imaging
journal, January 2012

  • Miao, Jianwei; Sandberg, Richard L.; Song, Changyong
  • IEEE Journal of Selected Topics in Quantum Electronics, Vol. 18, Issue 1
  • DOI: 10.1109/JSTQE.2011.2157306

Simultaneous X-ray fluorescence and ptychographic microscopy of Cyclotella meneghiniana
journal, January 2012

  • Vine, D. J.; Pelliccia, D.; Holzner, C.
  • Optics Express, Vol. 20, Issue 16
  • DOI: 10.1364/OE.20.018287

MemXCT: memory-centric X-ray CT reconstruction with massive parallelization
conference, November 2019

  • Hidayetoğlu, Mert; Biçer, Tekin; de Gonzalo, Simon Garcia
  • SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1145/3295500.3356220

Further improvements to the ptychographical iterative engine
journal, January 2017


Coherent lensless X-ray imaging
journal, November 2010


GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
conference, May 2020

  • Yu, Xiaodong; Wei, Fengguo; Ou, Xinming
  • 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
  • DOI: 10.1109/IPDPS47924.2020.00037

PyNX.Ptycho : a computing library for X-ray coherent diffraction imaging of nanostructures
journal, September 2016

  • Mandula, Ondřej; Elzo Aizarna, Marta; Eymery, Joël
  • Journal of Applied Crystallography, Vol. 49, Issue 5
  • DOI: 10.1107/S1600576716012279

Demystifying automata processing: GPUs, FPGAs or Micron's AP?
conference, January 2017

  • Nourian, Marziyeh; Wang, Xiang; Yu, Xiaodong
  • Proceedings of the International Conference on Supercomputing - ICS '17
  • DOI: 10.1145/3079079.3079100

A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property
journal, January 1999


NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems
conference, June 2020

  • Chu, Ching-Hsiang; Kousha, Pouya; Awan, Ammar Ahmad
  • ICS '20: 2020 International Conference on Supercomputing, Proceedings of the 34th ACM International Conference on Supercomputing
  • DOI: 10.1145/3392717.3392771

GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation
journal, October 2014

  • Wang, Hao; Potluri, Sreeram; Bureddy, Devendar
  • IEEE Transactions on Parallel and Distributed Systems, Vol. 25, Issue 10
  • DOI: 10.1109/TPDS.2013.222

GPU-Based Iterative Medical CT Image Reconstructions
journal, March 2018

  • Yu, Xiaodong; Wang, Hao; Feng, Wu-chun
  • Journal of Signal Processing Systems, Vol. 91, Issue 3-4
  • DOI: 10.1007/s11265-018-1352-0

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations
journal, October 2017


Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
conference, May 2019

  • Awan, Ammar Ahmad; Bedorf, Jereon; Chu, Ching-Hsiang
  • 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • DOI: 10.1109/CCGRID.2019.00064