Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Open Research Challenges with Big Data - A Data-Scientist s Perspective

Conference ·
OSTI ID:1224759
In this paper, we discuss data-driven discovery challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context, we pose and debate the question - Are data mining algorithms scaling with the ability to store and compute? If yes, how? If not, why not? We survey recent developments in the state-of-the-art to discuss emerging and outstanding challenges in the design and implementation of machine learning algorithms at scale. We leverage experience from real-world Big Data knowledge discovery projects across domains of national security, healthcare and manufacturing to suggest our efforts be focused along the following axes: (i) the data science challenge - designing scalable and flexible computational architectures for machine learning (beyond just data-retrieval); (ii) the science of data challenge the ability to understand characteristics of data before applying machine learning algorithms and tools; and (iii) the scalable predictive functions challenge the ability to construct, learn and infer with increasing sample size, dimensionality, and categories of labels. We conclude with a discussion of opportunities and directions for future research.
Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
ORNL LDRD Director's R&D
DOE Contract Number:
AC05-00OR22725
OSTI ID:
1224759
Country of Publication:
United States
Language:
English

Similar Records

Machine Learning in the Big Data Era: Are We There Yet?
Conference · Tue Dec 31 23:00:00 EST 2013 · OSTI ID:1265308

Machine Learning for Big Data: A Study to Understand Limits at Scale
Technical Report · Sun Dec 20 23:00:00 EST 2015 · OSTI ID:1234336

Mining Large Heterogeneous Graphs using Cray s Urika
Conference · Mon Dec 31 23:00:00 EST 2012 · OSTI ID:1096972