Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information
  1. GALIC: hybrid multi-qubitwise pauli grouping for quantum computing measurement

    Abstract Observable estimation is a core primitive in NISQ-era algorithms targeting quantum chemistry applications. To reduce the state preparation overhead required for accurate estimation, recent works have proposed various simultaneous measurement schemes to lower estimator variance. Two primary grouping schemes have been proposed: fully commutativity (FC) and qubit-wise commutativity (QWC), with no compelling means of interpolation. In this work we propose a generalized framework for designing and analyzing context-aware hybrid FC/QWC commutativity relations. We use our framework to propose a noise-and-connectivity aware grouping strategy: Generalized backend-Aware pauLI Commutation (GALIC). We demonstrate how GALIC interpolates between FC and QWC, maintaining estimator accuracy in Hamiltonian estimation while lowering variance by an average of 20\% compared to QWC. We also explore the design space of near-term quantum devices using the GALIC framework, specifically comparing device noise levels and connectivity. We find that error suppression has a more than $$10\times$$ larger impact on device-aware estimator variance than qubit connectivity with even larger correlation differences in estimator biases.

  2. Early Exploration of a Flexible Framework for Efficient Quantum Linear Solvers in Power Systems

    The rapid integration of renewable energy resources presents formidable challenges in managing power grids. While advanced computing and machine learning techniques offer some solutions for accelerating grid modeling and simulation, there remain complex problems that classical computers cannot effectively address. Quantum computing, a promising technology, has the potential to fundamentally transform how we manage power systems, especially in scenarios with a higher proportion of renewable energy sources. One critical aspect is solving linear systems of equations, crucial for power system applications like power flow analysis, for which the Harrow-Hassidim-Lloyd (HHL) algorithm is a well-known quantum solution. However, HHL quantum circuits often exhibit excessive depth, making them impractical for current Noisy-Intermediate-Scale-Quantum (NISQ) devices. In this paper, we introduce a versatile framework, powered by NWQSim, that bridges the gap between power system applications and quantum linear solvers available in Qiskit. This framework empowers researchers to efficiently explore power system applications using quantum linear solvers. Through innovative gate fusion strategies, reduced circuit depth, and GPU acceleration, our simulator significantly enhances resource efficiency. Power flow case studies have demonstrated up to a eight-fold speedup compared to Qiskit Aer, all while maintaining comparable levels of accuracy.

  3. Bridging the Gap Between LLMs and LNS with Dynamic Data Format and Architecture Codesign

    Deep neural networks (DNNs) have achieved tremendous success in the past few years. However, their training and inference demand exceptional computational and memory resources. Quantization has been shown as an effective approach to mitigate the cost, with the mainstream data types reduced from FP32 to FP16/BF16 and recently FP8 in the latest NVIDIA H100 GPUs. With increasingly aggressive quantization, however, the conventional floating-point formats suffer from limited precision in representing numbers around zero. Recently, NVIDIA demonstrated the potential of using a Logarithmic Number System (LNS) for the next generation of tensor cores. While LNS mitigates the hurdles in representing small numbers, in this work we observed a mismatch between LNS and the emerging Large Language Models (LLM), where LLM exhibits significant outliers when directly adopting the LNS format. In this paper, we present a data-format/architecture codesign to bright this gap. On the format side, we propose a dynamic LNS format to flexibly represent outliers at a higher precision, by exploiting asymmetry in the LNS representation and identifying outliers through a per-vector basis. On the architecture side, for demonstration, we realize the dynamic LNS format in a systolic array, which can handle the irregularity of the outliers at runtime. We implement our approach on an Alveo U280 FPGA as a prototype. Experimental results show that our design can effectively handle the outliers and resolve the mismatch between LNS and LLM, contributing to an accuracy improvement of 15.4% and 16% over the floating-point and the original LNS baselines, using four state-of-the-art LLM models. Our observation and design lay a solid foundation for the large-scale adoption of the LNS format in the next-generation deep learning hardware.

  4. SNNPG: Using Spiking Neural Networks to Detect Attacks in the Power Grid

    We explore the potential of Spiking Neural Networks (SNN) to enhance the security of power grid operations by detecting False Data Injection (FDI) attacks. These attacks manipulate PMU readings, leading to erroneous control decisions and grid disruptions. We develop a method to convert Phase Measurement Unit (PMU) data into spike trains, capturing both temporal and spatial dimensions. Using an SNN model, we conduct evaluations with simulated power grid data, showcasing accuracy in detecting FDI attacks. SNN models rapidly identify anomalies in real-time PMU data, safeguarding grid operations by alerting operators to irregular readings and preventing incorrect decisions.

  5. Understanding Mixed Precision GEMM with MPGemmFI: Insights into Fault Resilience

    Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). Thus, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google TPUs is the support of mixed-precision enabled GEMM. For DNN models, lower-precision FP data formats and computation offer acceptable correctness but significant performance, area, and memory footprint improvement. While promising, the mixed-precision computation on error resilience remains unexplored. To this end, we develop a fault injection framework that systematically injects fault into the mixed-precision computation results. We investigate how the faults affect the accuracy of machine learning applications. Based on the characteristics of error resilience, we offer lightweight error detection and correction solutions that significantly improve the overall model accuracy by 75% if the models experience hardware faults. The solutions can be efficiently integrated into the accelerator's pipelines.

  6. DS-GL: Advancing Graph Learning via Harnessing the Power of Nature within Dynamic Systems

    With the rapid digitization of the world, an increasing number of real-world applications are turning to nonEuclidean data, modeled as graphs. Due to their intrinsic high complexity and irregularity, learning from graph data demands tremendous computational power. Recently, CMOS-compatible Ising machines, i.e., dynamic systems composed of CMOS components, have emerged as a new approach that harnesses the inherent power of natural annealing within dynamic systems to efficiently resolve binary optimization problems and have been adopted for traditional graph computation, such as max-cut. However, when performing complex Graph Learning (GL) tasks, Ising machines face significant hurdles: (i) they are inherently binary and thus ill-suited for real-valued problems; (ii) their expensive all-to-all coupling network that guarantees effective natural annealing poses daunting scalability concerns. To address these challenges, this paper proposes a nature-powered graph learning framework dubbed DS-GL, which is the first effort to transform the process of solving graph learning problems into the natural annealing process within a parameterized dynamic system embodied as a CMOS chip. To tackle the two major hurdles, DS-GL first augments the Ising machine architecture to modify the self-reaction term of its Hamiltonian function from linear to quadratic, effectively serving as an energy regulator. This adjustment maintains the system’s original physical interpretation while enabling it to process continuous, real-valued data. Second, to address the scaling issue, DS-GL further upgrades the real-valued dense Ising machine by decomposing it into a mesh-based multi-PE dynamic system that supports efficient distributed spatial-temporal co-annealing across different PEs through sparse interconnects. By exploiting the inherent sparsity and component structures in real-world graphs, DS-GL is able to map complex graph learning tasks onto the scalable dynamic system while maintaining high accuracy. Evaluations with three diverse GL applications across six real-world datasets, including traffic flow and COVID-19 prediction, show that DS-GL can deliver from 102× to 106× speedups and 500× energy reduction over Graph Neural Networks on GPUs, with 5% - 20% accuracy enhancement.

  7. OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model

    With the sharp increasing volume of user data, Deep Learning Recommendation Model (DLRM) becomes an indispensable infrastructure in large technology companies. However, large-scale DLRM on the multi-GPU platform is still inefficient due to unbalanced workload partitioning and intensive inter-GPU communication. To this end, we propose OPER, an OPtimality guided Embedding table placement for large-scale Recommendation model training and inference. OPER explores the potential of mitigating remote memory access latency in DLRM through fine-grained embedding table placement. Specifically, OPER proposes a theoretical modeling that builds up the relationship between EMT placement and the embedding communication latency in both training and inference. OPER proves the NP hardness of finding the optimal embedding table placement and proposes a heuristic algorithm that yields near optimal placement. OPER implements a SHMEM-based embedding table training system and a unified embedding index mapping to support fine-grained embedding table sharding and placement. Comprehensive experiments reveal that OPER achieves on average 3.4× and 5.1× speedup on training and inference respectively over state-of-the-art DLRM frameworks.

  8. Quapprox: A Framework for Benchmarking the Approximability of Variational Quantum Circuit

    Most of the existing quantum neural network models, such as variational quantum circuits (VQCs), are limited in their ability to explore the non-linear relationships in input data. This gradually becomes the main obstacle for it to tackle realistic applications, such as natural language processing, medical image processing, and wireless communications. Recently, there have emerged research efforts that enable VQCs to perform non-linear operations. However, it is still unclear on the approximability of a given VQC (i.e., the order of non-linearity that can be handled by a specified design). In response to this issue, we developed an automated tool designed to benchmark the approximation of a given VQC. The proposed tool will generate a set of synthetic datasets with different orders of non-linearity and train the given VQC on these datasets to estimate their approximability. Our experiments benchmark VQCs with different designs, where we know their theoretic approximability. We then show that the proposed tool can precisely estimate the approximability, which is consistent with the theoretic value, indicating that the proposed tool can be used for benchmarking the approximability of a given quantum circuit for learning tasks.

  9. Accurate and Data‐Efficient Micro X‐ray Diffraction Phase Identification Using Multitask Learning: Application to Hydrothermal Fluids

    Traditional analysis of highly distorted micro X-ray diffraction (μ-XRD) patterns from hydrothermal fluid environments is a time-consuming process, often requiring substantial data preprocessing and labeled experimental data. Herein, the potential of deep learning with a multitask learning (MTL) architecture to overcome these limitations is demonstrated. MTL models are trained to identify phase information in μ-XRD patterns, minimizing the need for labeled experimental data and masking preprocessing steps. Notably, MTL models show superior accuracy compared to binary classification convolutional neural networks. Additionally, introducing a tailored cross-entropy loss function improves MTL model performance. Most significantly, MTL models tuned to analyze raw and unmasked XRD patterns achieve close performance to models analyzing preprocessed data, with minimal accuracy differences. This work indicates that advanced deep learning architectures like MTL can automate arduous data handling tasks, streamline the analysis of distorted XRD patterns, and reduce the reliance on labor-intensive experimental datasets.

  10. Red-QAOA: Efficient Variational Optimization through Circuit Reduction

    The Quantum Approximate Optimization Algorithm (QAOA) provides a quantum solution for combinatorial optimization problems. However, the optimal parameter searching process of QAOA is greatly affected by noise, leading to non-optimal solutions. This paper introduces a novel approach to optimize QAOA by exploiting the energy landscape concentration of similar instances via graph reduction, thus addressing the effect of noise. We formalize the notion of similar instances in QAOA and develop a Simulated Annealing-based graph reduction algorithm, called Red-QAOA, to identify the most similar subgraph for efficient parameter optimization. Red-QAOA outperforms state-of-the-art Graph Neural Network (GNN) based graph pooling techniques in performance and demonstrates effectiveness on a diverse set of real-world optimization problems encompassing 3200 graphs. Red-QAOA reduced the node counts and edge counts by 28% and 37%, respectively, while maintaining a low mean square error of 2%. These enable the identification of an optimal parameter set that is closer to the ideal true optimal solution in the presence of noise. By substantially streamlining the search for QAOA parameters, our approach sets the stage for the practical application of quantum algorithms in solving complex optimization problems.


Search for:
All Records
Author / Contributor
"Li, Ang"

Refine by:
Resource Type
Availability
Publication Date
  • 2009: 2 results
  • 2010: 1 results
  • 2011: 0 results
  • 2012: 1 results
  • 2013: 0 results
  • 2014: 1 results
  • 2015: 1 results
  • 2016: 6 results
  • 2017: 17 results
  • 2018: 3 results
  • 2019: 8 results
  • 2020: 23 results
  • 2021: 28 results
  • 2022: 23 results
  • 2023: 37 results
  • 2024: 79 results
  • 2025: 1 results
2009
2025
Author / Contributor
Research Organization