Storage-Intensive Supercomputing Benchmark Study

Cohen, J; Dossa, D; Gokhale, M; Hysom, D; May, J; Pearce, R; Yoo, A

doi:10.2172/924182

Title: Storage-Intensive Supercomputing Benchmark Study

Technical Report · Tue Oct 30 00:00:00 EDT 2007

DOI:https://doi.org/10.2172/924182· OSTI ID:924182

Cohen, J; Dossa, D; Gokhale, M; Hysom, D; May, J; Pearce, R; Yoo, A

Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe: (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows: SuperMicro X7DBE Xeon Dual Socket Blackford Server Motherboard; 2 Intel Xeon Dual-Core 2.66 GHz processors; 1 GB DDR2 PC2-5300 RAM (2 x 512); 80GB Hard Drive (Seagate SATA II Barracuda). The Fusion board is presently capable of 4X in a PCIe slot. The image resampling benchmark was run on a dual Xeon workstation with NVIDIA graphics card (see Chapter 5 for full specification). An XtremeData Opteron+FPGA was used for the language classification application. We observed that these benchmarks are not uniformly I/O intensive. The only benchmark that showed greater that 50% of the time in I/O was the graph algorithm when it accessed data files over NFS. When local disk was used, the graph benchmark spent at most 40% of its time in I/O. The other benchmarks were CPU dominated. The image resampling benchmark and language classification showed order of magnitude speedup over software by using co-processor technology to offload the CPU-intensive kernels. Our experiments to date suggest that emerging hardware technologies offer significant benefit to boosting the performance of data-intensive algorithms. Using GPU and FPGA co-processors, we were able to improve performance by more than an order of magnitude on the benchmark algorithms, eliminating the processor bottleneck of CPU-bound tasks. Experiments with a prototype solid state nonvolative memory available today show 10X better throughput on random reads than disk, with a 2X speedup on a graph processing benchmark when compared to the use of local SATA disk.

View Technical Report

Cite

Export

Save

Research Organization:: Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: W-7405-ENG-48

OSTI ID:: 924182

Report Number(s):: UCRL-TR-236179; TRN: US200806%%489

Country of Publication:: United States

Language:: English

Similar Records

A next-generation parallel file system for Linux cluster.

Journal Article · Thu Jan 01 00:00:00 EST 2004 · LinuxWorld Mag. · OSTI ID:924182

Latham, R; Miller, N; Ross, R; +1 more

A Compute Capable SSD Architecture for Next-Generation Non-volatile Memories

Thesis/Dissertation · Wed Jan 01 00:00:00 EST 2014 · OSTI ID:924182

De, Arup

Evaluation of Graph Analytics Frameworks Using the GAP Benchmark Suite

Conference · Thu Nov 19 00:00:00 EST 2020 · OSTI ID:924182

Azad, Md Ariful; Aznaveh, Mohsen M.; Beamer, Scott; +22 more

Related Subjects

99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE
ALGORITHMS
BENCHMARKS
CLASSIFICATION
DATA PROCESSING
KERNELS
MONITORING
NATIONAL SECURITY
PERFORMANCE
PIPELINES
PROCESSING
STORAGE

Title: Storage-Intensive Supercomputing Benchmark Study

Citation Formats

Similar Records

Related Subjects