Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Flexible storage services for parallel data mining

Conference ·
OSTI ID:465727
The demands of mining and analyzing vast amounts of data often lead scientists to supercomputer centers, with their high-performance parallel processors and large-scale hierarchical storage. Once there, however, clients quickly come face to face with a number of harsh realities. Common constraints are: (1) disk space, while impressive in aggregate on machines with more than 100 nodes, generally amounts to only a couple of gigabytes per node; (2) local disk space is scratch space every query starts and ends with no data on compute nodes` local disks; (3) mass storage is generally a (widely) shared resource, and is not user- configurable; (4) machine use is scheduled- no daemon processes may be left running; (5) while some nodes may be ``closer`` than others (e.g., HIPPI-connected) to mass storage, current schedulers tend nonetheless to allow users to specify only the number of nodes desired, not their I/O topology; (6) mass storage access from multiple nodes may in fact be routed through a single node (e.g., a distinguished I/O node per rack).
Research Organization:
Argonne National Lab., IL (United States)
Sponsoring Organization:
USDOE Office of Energy Research, Washington, DC (United States)
DOE Contract Number:
W-31109-ENG-38
OSTI ID:
465727
Report Number(s):
ANL-HEP-CP--96-40; CONF-961209--1; ON: DE97003872
Country of Publication:
United States
Language:
English