skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Nonuniformly communicating noncontiguous data: a case study with PETSc and MPI.

Abstract

Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance ofmore » a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.« less

Authors:
; ; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC)
OSTI Identifier:
971458
Report Number(s):
ANL/MCS/CP-58614
TRN: US201004%%18
DOE Contract Number:
DE-AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007).; Mar. 26, 2007 - Mar. 30, 2007; Long Beach, CA
Country of Publication:
United States
Language:
ENGLISH
Subject:
97 MATHEMATICAL METHODS AND COMPUTING; 99 GENERAL AND MISCELLANEOUS//MATHEMATICS, COMPUTING, AND INFORMATION SCIENCE; DATA TRANSMISSION; PARALLEL PROCESSING; MEMORY MANAGEMENT; EVALUATION; PERFORMANCE

Citation Formats

Balaji, P., Buntinas, D., Balay, S., Smith, B., Thakur, R., Gropp, W., and Mathematics and Computer Science. Nonuniformly communicating noncontiguous data: a case study with PETSc and MPI.. United States: N. p., 2007. Web. doi:10.1109/IPDPS.2007.370223.
Balaji, P., Buntinas, D., Balay, S., Smith, B., Thakur, R., Gropp, W., & Mathematics and Computer Science. Nonuniformly communicating noncontiguous data: a case study with PETSc and MPI.. United States. doi:10.1109/IPDPS.2007.370223.
Balaji, P., Buntinas, D., Balay, S., Smith, B., Thakur, R., Gropp, W., and Mathematics and Computer Science. Mon . "Nonuniformly communicating noncontiguous data: a case study with PETSc and MPI.". United States. doi:10.1109/IPDPS.2007.370223.
@article{osti_971458,
title = {Nonuniformly communicating noncontiguous data: a case study with PETSc and MPI.},
author = {Balaji, P. and Buntinas, D. and Balay, S. and Smith, B. and Thakur, R. and Gropp, W. and Mathematics and Computer Science},
abstractNote = {Due to the complexity associated with developing parallel applications, scientists and engineers rely on high-level software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred by a process to each of the other processes) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with micro-benchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the performance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.},
doi = {10.1109/IPDPS.2007.370223},
journal = {},
number = ,
volume = ,
place = {United States},
year = {Mon Jan 01 00:00:00 EST 2007},
month = {Mon Jan 01 00:00:00 EST 2007}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • This report documents a field study of 78 small commercial customers in the Sacramento Municipal Utility District service territory who volunteered for an integrated energy-efficiency/demand-response (EE-DR) program in the summer of 2008. The original objective for the pilot was to provide a better understanding of demand response issues in the small commercial sector. Early findings justified a focus on offering small businesses (1) help with the energy efficiency of their buildings in exchange for occasional load shed, and (2) a portfolio of options to meet the needs of a diverse customer sector. To meet these expressed needs, the research pilotmore » provided on-site energy efficiency advice and offered participants several program options, including the choice of either a dynamic rate or monthly payment for air-conditioning setpoint control. Overall results show that pilot participants had energy savings of 20%, and the potential for an additional 14% to 20% load drop during a 100 F demand response event. In addition to the efficiency-related bill savings, participants on the dynamic rate saved an estimated 5% on their energy costs compared to the standard rate. About 80% of participants said that the program met or surpassed their expectations, and three-quarters said they would probably or definitely participate again without the $120 participation incentive. These results provide evidence that energy efficiency programs, dynamic rates and load control programs can be used concurrently and effectively in the small business sector, and that communicating thermostats are a reliable tool for providing air-conditioning load shed and enhancing the ability of customers on dynamic rates to respond to intermittent price events.« less
  • In this paper, we discuss the performance achieved by several implementations of the recently defined Message Passing Interface (MPI) standard. In particular, performance results for different implementations of the broadcast operation are analyzed and compared on the Delta, Paragon, SP1 and CM5.
  • We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausibility), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to 1.6TB particle physics, 2.2TB and 16TB climate modeling and 1.1TB bioimaging data. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark’s data parallel model. We perform scalingmore » experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance.« less
  • Data-parallel languages such as High Performance Fortran (HPF) present a simple execution model in which a single thread of control performs high-level operations on distributed arrays. These languages can greatly ease the development of parallel programs. Yet there are large classes of applications for which a mixture of task and data parallelism is most appropriate. Such applications can be structured as collections of data-parallel tasks that communicate by using explicit message passing. Because the Message Passing Interface (MPI) defines standardized, familiar mechanisms for this communication model, the authors propose that HPF tasks communicate by making calls to a coordination librarymore » that provides an HPF binding for MPI. The semantics of a communication interface for sequential languages can be ambiguous when the interface is invoked from a parallel language; they show how these ambiguities can be resolved by describing one possible HPF binding for MPI. They then present the design of a library that implements this binding, discuss issues that influenced the design decisions, and evaluate the performance of a prototype HPF/MPI library using a communications microbenchmark and application kernel. Finally, they discuss how MPI features might be incorporated into the design framework.« less
  • Europe has a large chemical industry and a high population density. The differing legislative contexts of risk analysis in the countries of Europe provide a rich source for comparing different approaches. In this paper, the authors draw on the European experience to consider the advantages and disadvantages of a worst case scenario approach. The escape of the total contents of a large storage vessel of hazardous materials can cause effects over a large distance. Typically, a large flammable release could cause deaths over a distance of hundreds of meters and damage over a number of kilometers. For toxic releases themore » situation is even worse, since whereas flammable releases may be considered non-hazardous when they have diluted below their lower flammable limit, toxic gases can disable or kill concentrations of less than 1%. Generally, the complete failure of a large storage vessel of ammonia, chlorine or hydrogen fluoride instable atmospheric conditions could travel many kilometers before being diluted sufficiently not to be a health risk.« less