skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods

Conference ·
OSTI ID:1295155

We present a service platform for schema-leess exploration of data and discovery of patient-related statistics from healthcare data sets. The architecture of this platform is motivated by the need for fast, schema-less, and flexible approaches to SQL-based exploration and discovery of information embedded in the common, heterogeneously structured healthcare data sets and supporting components (electronic health records, practice management systems, etc.) The motivating use cases described in the paper are clinical trials candidate discovery, and a treatment effectiveness analysis. Following the use cases, we discuss the key features and software architecture of the platform, the underlying core components (Apache Parquet, Drill, the web services server), and the runtime profiles and performance characteristics of the platform. We conclude by showing dramatic speedup with some approaches, and the performance tradeoffs and limitations of others.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1295155
Resource Relation:
Conference: Big Data Services, Oxford, United Kingdom, 20160329, 20160401
Country of Publication:
United States
Language:
English

Similar Records

Striped Data Analysis Framework
Conference · Wed Jan 01 00:00:00 EST 2020 · EPJ Web Conf. · OSTI ID:1295155

Striped Data Server for Scalable Parallel Data Analysis
Journal Article · Sat Sep 01 00:00:00 EDT 2018 · Journal of Physics. Conference Series · OSTI ID:1295155

National Geothermal Data System: Transforming the Discovery, Access, and Analytics of Data for Geothermal Exploration
Conference · Wed May 01 00:00:00 EDT 2013 · OSTI ID:1295155