skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Review of Lightweight Thread Approaches for High Performance Computing

Abstract

High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.

Authors:
; ; ; ; ;
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science - Office of Advanced Scientific Computing Research
OSTI Identifier:
1365837
DOE Contract Number:
AC02-06CH11357
Resource Type:
Conference
Resource Relation:
Conference: 2016 Institute of Electrical and Electronics Engineers Cluster, 09/12/16 - 09/16/16, Taipei, TW
Country of Publication:
United States
Language:
English

Citation Formats

Castello, Adrian, Pena, Antonio J., Seo, Sangmin, Mayo, Rafael, Balaji, Pavan, and Quintana-Orti, Enrique S. A Review of Lightweight Thread Approaches for High Performance Computing. United States: N. p., 2016. Web. doi:10.1109/CLUSTER.2016.12.
Castello, Adrian, Pena, Antonio J., Seo, Sangmin, Mayo, Rafael, Balaji, Pavan, & Quintana-Orti, Enrique S. A Review of Lightweight Thread Approaches for High Performance Computing. United States. doi:10.1109/CLUSTER.2016.12.
Castello, Adrian, Pena, Antonio J., Seo, Sangmin, Mayo, Rafael, Balaji, Pavan, and Quintana-Orti, Enrique S. 2016. "A Review of Lightweight Thread Approaches for High Performance Computing". United States. doi:10.1109/CLUSTER.2016.12. https://www.osti.gov/servlets/purl/1365837.
@article{osti_1365837,
title = {A Review of Lightweight Thread Approaches for High Performance Computing},
author = {Castello, Adrian and Pena, Antonio J. and Seo, Sangmin and Mayo, Rafael and Balaji, Pavan and Quintana-Orti, Enrique S.},
abstractNote = {High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.},
doi = {10.1109/CLUSTER.2016.12},
journal = {},
number = ,
volume = ,
place = {United States},
year = 2016,
month = 9
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share:
  • Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. Themore » need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.« less
  • Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH’s National Center for Biotechnology Information (NCBI), The Institute for Genomic Research (TIGR), and the DOE’s Joint Genome Institute (JGI) Integrated Microbial Genome (IMG) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets—enabling end users to rapidly access a growing public data source, as well as utilizemore » analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines—from small clusters to emerging petascale machines—can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats.« less
  • File storage systems are playing an increasingly important role in high-performance computing as the performance gap between CPU and disk increases. It could take a long time to develop an entire system from scratch. Solutions will have to be built as extensions to existing systems. If new portable, customized software components are plugged into these systems, better sustained high I/O performance and higher scalability will be achieved, and the development cycle of next-generation of parallel file systems will be shortened. The overall research objective of this ECPI development plan aims to develop a lightweight, customized, high-performance I/O management package namedmore » LightI/O to extend and leverage current parallel file systems used by DOE. During this period, We have developed a novel component in LightI/O and prototype them into PVFS2, and evaluate the resultant prototype—extended PVFS2 system on data-intensive applications. The preliminary results indicate the extended PVFS2 delivers better performance and reliability to users. A strong collaborative effort between the PI at the University of Nebraska Lincoln and the DOE collaborators—Drs Rob Ross and Rajeev Thakur at Argonne National Laboratory who are leading the PVFS2 group makes the project more promising.« less
  • Our group has been working with ANL collaborators on the topic bridging the gap between parallel file system and local file system during the course of this project period. We visited Argonne National Lab -- Dr. Robert Ross's group for one week in the past summer 2007. We looked over our current project progress and planned the activities for the incoming years 2008-09. The PI met Dr. Robert Ross several times such as HEC FSIO workshop 08, SC08 and SC10. We explored the opportunities to develop a production system by leveraging our current prototype to (SOGP+PVFS) a new PVFS version.more » We delivered SOGP+PVFS codes to ANL PVFS2 group in 2008.We also talked about exploring a potential project on developing new parallel programming models and runtime systems for data-intensive scalable computing (DISC). The methodology is to evolve MPI towards DISC by incorporating some functions of Google MapReduce parallel programming model. More recently, we are together exploring how to leverage existing works to perform (1) coordination/aggregation of local I/O operations prior to movement over the WAN, (2) efficient bulk data movement over the WAN, (3) latency hiding techniques for latency-intensive operations. Since 2009, we start applying Hadoop/MapReduce to some HEC applications with LANL scientists John Bent and Salman Habib. Another on-going work is to improve checkpoint performance at I/O forwarding Layer for the Road Runner super computer with James Nuetz and Gary Gridder at LANL. Two senior undergraduates from our research group did summer internships about high-performance file and storage system projects in LANL since 2008 for consecutive three years. Both of them are now pursuing Ph.D. degree in our group and will be 4th year in the PhD program in Fall 2011 and go to LANL to advance two above-mentioned works during this winter break. Since 2009, we have been collaborating with several computer scientists (Gary Grider, John bent, Parks Fields, James Nunez, Hsing-Bung Chen, etc) from HPC5 and James Ahrens from Advanced Computing Laboratory in Los Alamos National Laboratory. We hold a weekly conference and/or video meeting on advancing works at two fronts: the hardware/software infrastructure of building large-scale data intensive cluster and research publications. Our group members assist in constructing several onsite LANL data intensive clusters. Two parties have been developing software codes and research papers together using both sides resources.« less
  • This paper describes the design, development and performance of a lightweight precision gimbal with dual-axis slew capability to be used in a closed-loop optical tracking system at Lawrence Livermore National Laboratory-LLNL. The motivation for the development of this gimbal originates from the need to acquire and accurately localize warm objects (T{approximately}500 K) in a cluttered background. The design of the gimbal is centered around meeting the following performance requirements: pointing accuracy with control < 35 {mu}rad-(1-{omega}); slew capability > 0.2 rad/sec; mechanical weight < 5 kg. These performance requirements are derived by attempting to track a single target from multiplemore » satellites in low Earth orbit using a mid-wave infrared camera. Key components in the gimbal hardware that are essential to meeting the performance objectives include a nickel plated beryllium mirro, an accurate lightweight capacitive pickoff device for angular measurement about the elevation axis, a 16-bit coarse/fine resolver for angular measurement about the azimuth axis, a toroidally wound motor with low hysteresis for providing torque about the azimuth axis, and the selection of beryllium parts to insure high stiffness to weight ratios and more efficient thermal conductivity. Each of these elements are discussed in detail to illustrate the design trades performed to meet the tracking and slewing requirements demanded. Preliminary experimental results are also given for various commanded tracking maneuvers.« less