Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Active Learning in the Era of Big Data

Technical Report ·
DOI:https://doi.org/10.2172/1225849· OSTI ID:1225849
 [1];  [1]
  1. Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive) machine learning research. To facilitate the development, testing and deployment of active learning for real applications, we have built an open-source software system for large-scale active learning research and experimentation. The system, called NEXT, provides a unique platform for realworld, reproducible active learning research. This paper details the challenges of building the system and demonstrates its capabilities with several experiments. The results show how experimentation can help expose strengths and weaknesses of active learning algorithms, in sometimes unexpected and enlightening ways.
Research Organization:
Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
Sponsoring Organization:
USDOE National Nuclear Security Administration (NNSA)
DOE Contract Number:
AC04-94AL85000
OSTI ID:
1225849
Report Number(s):
SAND2015--9475R; 607873
Country of Publication:
United States
Language:
English

Similar Records

Machine Learning in the Big Data Era: Are We There Yet?
Conference · Tue Dec 31 23:00:00 EST 2013 · OSTI ID:1265308

Scientific machine learning benchmarks
Journal Article · Tue Apr 05 20:00:00 EDT 2022 · Nature Reviews Physics · OSTI ID:1877481

Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning
Conference · Mon Jun 01 00:00:00 EDT 2020 · OSTI ID:1648995

Related Subjects