Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

A next-generation parallel file system for Linux cluster.

Journal Article · · LinuxWorld Mag.
OSTI ID:961510
Computational scientists use large parallel computers to simulate events that occur in the real world. These large-scale applications are necessary in order to better understand scientific phenomena or to predict behavior. Computing resources are often a limiting factor in the accuracy and scope of these simulations. The limiting resources include not only CPU and memory, but also the I/O subsystem, as many such applications generate or process enormous volumes of data. In order for the simulation to continue to run quickly, the I/O system must be capable of storing many hundreds of megabytes per second of data, and many disks must be used in concert. The software that organizes these disks into a coherent file system is called a 'parallel file system.' Parallel file systems are designed specifically to provide very high I/O rates when accessed by many processes at once. These processes are distributed across many different computers, or nodes, that make up the parallel computer. Figure 1 shows a high-level view of a parallel computer with a parallel file system attached. Nodes for computation are connected to each other and to I/O server nodes through the cluster network, and store data on disks attached to the server nodes. You don't have to work at a national laboratory, own a 1,000-node cluster, or study global warming to take advantage of a parallel file system. For many years now the Parallel Virtual File System (PVFS) has been available for Linux clusters, allowing anyone to set up and use the same parallel file system that is currently in use on many large clusters around the world. More recently a completely new parallel file system, PVFS2, has been released. This new file system is more flexible, takes better advantage of the hardware in today's clusters, scales to larger clusters, and is simpler to manage than its predecessor. There is no single file system that is the perfect solution for every I/O workload, and PVFS2 is no exception. High-performance applications rely on a different set of features to access data than those provided by typical networked file systems. In particular, PVFS2 is best suited for I/O-intensive applications. PVFS2 wasn't meant for home directories, but as a separate, fast, scalable file system, it's tough to beat. If you have a large amount of data and need fast access to it from many machines, it's worth looking into PVFS2.
Research Organization:
Argonne National Laboratory (ANL)
Sponsoring Organization:
SC
DOE Contract Number:
AC02-06CH11357
OSTI ID:
961510
Report Number(s):
ANL/MCS/JA-48544
Journal Information:
LinuxWorld Mag., Journal Name: LinuxWorld Mag. Journal Issue: 1 ; Jan. 2004 Vol. 2
Country of Publication:
United States
Language:
ENGLISH