Benchmarking: More Aspects of High Performance Computing

Ravindrudu, Rahul

doi:10.2172/837280

Title: Benchmarking: More Aspects of High Performance Computing

Thesis/Dissertation · Thu Jan 01 00:00:00 EST 2004

DOI:https://doi.org/10.2172/837280· OSTI ID:837280

Ravindrudu, Rahul ^[1]

Iowa State Univ., Ames, IA (United States)

The original HPL algorithm makes the assumption that all data can be fit entirely in the main memory. This assumption will obviously give a good performance due to the absence of disk I/O. However, not all applications can fit their entire data in memory. These applications which require a fair amount of I/O to move data to and from main memory and secondary storage, are more indicative of usage of an Massively Parallel Processor (MPP) System. Given this scenario a well designed I/O architecture will play a significant part in the performance of the MPP System on regular jobs. And, this is not represented in the current Benchmark. The modified HPL algorithm is hoped to be a step in filling this void. The most important factor in the performance of out-of-core algorithms is the actual I/O operations performed and their efficiency in transferring data to/from main memory and disk, Various methods were introduced in the report for performing I/O operations. The I/O method to use depends on the design of the out-of-core algorithm. Conversely, the performance of the out-of-core algorithm is affected by the choice of I/O operations. This implies, good performance is achieved when I/O efficiency is closely tied with the out-of-core algorithms. The out-of-core algorithms must be designed from the start. It is easily observed in the timings for various plots, that I/O plays a significant part in the overall execution time. This leads to an important conclusion, retro-fitting an existing code may not be the best choice. The right-looking algorithm selected for the LU factorization is a recursive algorithm and performs well when the entire dataset is in memory. At each stage of the loop the entire trailing submatrix is read into memory panel by panel. This gives a polynomial number of I/O reads and writes. If the left-looking algorithm was selected for the main loop, the number of I/O operations involved will be linear on the number of columns. This is due to the data access pattern for the left-looking factorization. The right-looking algorithm performs better for in-core data, but the left-looking will perform better for out-of-core data due to the reduced I/O operations. Hence the conclusion that out-of-core algorithms will perform better when designed from start. The out-of-core and thread based computation do not interact in this case, since I/O is not done by the threads. The performance of the thread based computation does not depend on I/O as the algorithms are in the BLAS algorithms which assumes all the data to be in memory. This is the reason the out-of-core results and OpenMP threads results were presented separately and no attempt to combine them was made. In general, the modified HPL performs better with larger block sizes, due to less I/O involved for out-of-core part and better cache utilization for the thread based computation.

View Thesis/Dissertation

Cite

Export

Save

Research Organization:: Ames Lab., Ames, IA (United States)

Sponsoring Organization:: USDOE Office of Science (SC)

DOE Contract Number:: W-7405-Eng-82

OSTI ID:: 837280

Report Number(s):: IS-T 2196; TRN: US200506%%93

Resource Relation:: Other Information: TH: Thesis (M.S.); Submitted to Iowa State Univ., Ames, IA (US); PBD: 19 Dec 2004

Country of Publication:: United States

Language:: English

Similar Records

The design and implementation of the parallel out-of-core ScaLAPACK LU, QR and Cholesky factorization routines

Technical Report · Tue Apr 01 00:00:00 EST 1997 · OSTI ID:837280

D`Azevedo, E F; Dongarra, J J

Data Locality Enhancement of Dynamic Simulations for Exascale Computing (Final Report)

Technical Report · Fri Nov 29 00:00:00 EST 2019 · OSTI ID:837280

Shen, Xipeng

Multi-Resolution Indexing for Hierarchical Out-of-Core Traversal of Rectilinear Grids

Conference · Mon Jul 10 00:00:00 EDT 2000 · OSTI ID:837280

Pascucci, V

Related Subjects

97 MATHEMATICS AND COMPUTING
ALGORITHMS
ARCHITECTURE
DESIGN
EFFICIENCY
FACTORIZATION
PERFORMANCE
POLYNOMIALS
STORAGE

Title: Benchmarking: More Aspects of High Performance Computing

Citation Formats

Similar Records

Related Subjects