Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Efficient Databases for MPC Microdata (Final Report)

Technical Report ·
DOI:https://doi.org/10.2172/1048538· OSTI ID:1048538

The purpose of this grant was to develop the theory and practice of high-performance databases for massive streamed datasets. Over the last three years, we have developed fast indexing technology, that is, technology for rapidly ingesting data and storing that data so that it can be efficiently queried and analyzed. During this project we developed the technology so that high-bandwidth data streams can be indexed and queried efficiently. Our technology has been proven to work data sets composed of tens of billions of rows when the data streams arrives at over 40,000 rows per second. We achieved these numbers even on a single disk driven by two cores. Our work comprised (1) new write-optimized data structures with better asymptotic complexity than traditional structures, (2) implementation, and (3) benchmarking. We furthermore developed a prototype of TokuFS, a middleware layer that can handle microdata I/O packaged up in an MPI-IO abstraction.

Research Organization:
Tokutek Inc., Lexington, MA (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
DOE Contract Number:
FG02-08ER25853
OSTI ID:
1048538
Report Number(s):
DOE/ER25853--1
Country of Publication:
United States
Language:
English

Similar Records

Efficient Analysis of Live and Historical Streaming Data and itsApplication to Cybersecurity
Conference · Fri Apr 06 00:00:00 EDT 2007 · OSTI ID:920351

...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats
Conference · Wed Dec 31 23:00:00 EST 2008 · OSTI ID:982187

FastQuery: A Parallel Indexing System for Scientific Data
Conference · Fri Jul 29 00:00:00 EDT 2011 · OSTI ID:1056551

Related Subjects