Efficient Databases for MPC Microdata (Final Report)
- Tokutek Inc., Lexington, MA (United States)
The purpose of this grant was to develop the theory and practice of high-performance databases for massive streamed datasets. Over the last three years, we have developed fast indexing technology, that is, technology for rapidly ingesting data and storing that data so that it can be efficiently queried and analyzed. During this project we developed the technology so that high-bandwidth data streams can be indexed and queried efficiently. Our technology has been proven to work data sets composed of tens of billions of rows when the data streams arrives at over 40,000 rows per second. We achieved these numbers even on a single disk driven by two cores. Our work comprised (1) new write-optimized data structures with better asymptotic complexity than traditional structures, (2) implementation, and (3) benchmarking. We furthermore developed a prototype of TokuFS, a middleware layer that can handle microdata I/O packaged up in an MPI-IO abstraction.
- Research Organization:
- Tokutek Inc., Lexington, MA (United States)
- Sponsoring Organization:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR)
- DOE Contract Number:
- FG02-08ER25853
- OSTI ID:
- 1048538
- Report Number(s):
- DOE/ER25853--1
- Country of Publication:
- United States
- Language:
- English
Similar Records
...And Eat it Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats
FastQuery: A Parallel Indexing System for Scientific Data