skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scalable Knowledge Graph Analytics at 136 Petaflop/s

Abstract

We are motivated by newly proposed methods for data mining large-scale corpora of scholarly publications, such as the full biomedical literature, which may consist of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover how concepts relate to one another. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as one of computing all-pairs shortest paths (APSP), which becomes a significant bottleneck. In this context, we present a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which we call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path). For our largest experiments, we ran DSNAPSHOT on a connected input graph with millions of vertices using 4, 096nodes (24,576GPUs) of the Oak Ridge National Laboratory's Summit supercomputer system. We find DSNAPSHOT achieves a sustained performance of 136×1015 floating-point operations per second (136petaflop/s) at a parallel efficiency of 90% under weak scaling and, in absolute speed, 70% of the best possible performance given our computation (in the single-precision tropical semiring or “min-plus” algebra). Looking forward, we believe this novel capability will enable the mining of scholarly knowledge corpora when embedded and integrated into artificialmore » intelligence-driven natural language processing workflows at scale.« less

Authors:
ORCiD logo [1]; ORCiD logo [1];  [1]; ORCiD logo [1];  [2]; ORCiD logo [1];  [3]; ORCiD logo [1]
  1. ORNL
  2. Georgia Institute of Technology
  3. Georgia Institute of Technology, Atlanta
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
OSTI Identifier:
1798621
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: The International Conference on High Performance Computing, Networking, Storage and Analysis (SC'20) - Atlanta, Georgia, United States of America - 11/16/2020 5:00:00 AM-11/19/2020 5:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Kannan, Ramakrishnan, Sao, Piyush, Lu, Hao, Herrmannova, Dasha, Thakkar, Vijay, Patton, Robert, Vuduc, Richard, and Potok, Thomas. Scalable Knowledge Graph Analytics at 136 Petaflop/s. United States: N. p., 2020. Web.
Kannan, Ramakrishnan, Sao, Piyush, Lu, Hao, Herrmannova, Dasha, Thakkar, Vijay, Patton, Robert, Vuduc, Richard, & Potok, Thomas. Scalable Knowledge Graph Analytics at 136 Petaflop/s. United States.
Kannan, Ramakrishnan, Sao, Piyush, Lu, Hao, Herrmannova, Dasha, Thakkar, Vijay, Patton, Robert, Vuduc, Richard, and Potok, Thomas. 2020. "Scalable Knowledge Graph Analytics at 136 Petaflop/s". United States. https://www.osti.gov/servlets/purl/1798621.
@article{osti_1798621,
title = {Scalable Knowledge Graph Analytics at 136 Petaflop/s},
author = {Kannan, Ramakrishnan and Sao, Piyush and Lu, Hao and Herrmannova, Dasha and Thakkar, Vijay and Patton, Robert and Vuduc, Richard and Potok, Thomas},
abstractNote = {We are motivated by newly proposed methods for data mining large-scale corpora of scholarly publications, such as the full biomedical literature, which may consist of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover how concepts relate to one another. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as one of computing all-pairs shortest paths (APSP), which becomes a significant bottleneck. In this context, we present a new high-performance algorithm and implementation of the Floyd-Warshall algorithm for distributed-memory parallel computers accelerated by GPUs, which we call DSNAPSHOT (Distributed Accelerated Semiring All-Pairs Shortest Path). For our largest experiments, we ran DSNAPSHOT on a connected input graph with millions of vertices using 4, 096nodes (24,576GPUs) of the Oak Ridge National Laboratory's Summit supercomputer system. We find DSNAPSHOT achieves a sustained performance of 136×1015 floating-point operations per second (136petaflop/s) at a parallel efficiency of 90% under weak scaling and, in absolute speed, 70% of the best possible performance given our computation (in the single-precision tropical semiring or “min-plus” algebra). Looking forward, we believe this novel capability will enable the mining of scholarly knowledge corpora when embedded and integrated into artificial intelligence-driven natural language processing workflows at scale.},
doi = {},
url = {https://www.osti.gov/biblio/1798621}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2020},
month = {11}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: