skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Input/Output Scalability of Genomic Alignment: How to Configure a Computational Biology Cluster

Conference ·
OSTI ID:15006307

Many scientific applications are I/O-intensive, which makes optimization and scaling difficult, especially on parallel architectures. The I/O requirements of computational biology applications are different from other scientific applications. The main difference is that many computational biology applications are embarrassingly parallel and require repeated read-only access to a large global database. In this paper we examine the scalability of an embarrassingly parallel computational biology application: psLayout, which played a crucial role in the mapping of the human genome. This study was carried out on three architecture: the native UCSC Linux cluster, a Linux cluster at Lawrence Livermore National Labs with a faster interconnect and NFS server, and the ASCI Blue-Pacific supercomputer. We show that a cluster equipped with a fast network and parallel file system or a scalable NFS server has reasonable I/O scalability. We believe that replication is an important issue when scaling to larger numbers of processors, and we introduce the design of a library for automatic data replication to address this issue.

Research Organization:
Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
Sponsoring Organization:
US Department of Energy (US)
DOE Contract Number:
W-7405-ENG-48
OSTI ID:
15006307
Report Number(s):
UCRL-JC-145770; TRN: US200407%%203
Resource Relation:
Conference: International Parallel and Distributed Processing Symposium, Fort Lauderdale, FL (US), 04/15/2002--04/19/2002; Other Information: PBD: 3 Oct 2001
Country of Publication:
United States
Language:
English

Similar Records

PETASCALE DATA STORAGE INSTITUTE (PDSI) Final Report
Technical Report · Mon Nov 26 00:00:00 EST 2012 · OSTI ID:15006307

Design and Implementation of Ceph: A Scalable Distributed File System
Conference · Wed Apr 19 00:00:00 EDT 2006 · OSTI ID:15006307

A next-generation parallel file system for Linux cluster.
Journal Article · Thu Jan 01 00:00:00 EST 2004 · LinuxWorld Mag. · OSTI ID:15006307