Active Learning in the Era of Big Data
- Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)
Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive) machine learning research. To facilitate the development, testing and deployment of active learning for real applications, we have built an open-source software system for large-scale active learning research and experimentation. The system, called NEXT, provides a unique platform for realworld, reproducible active learning research. This paper details the challenges of building the system and demonstrates its capabilities with several experiments. The results show how experimentation can help expose strengths and weaknesses of active learning algorithms, in sometimes unexpected and enlightening ways.
- Research Organization:
- Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States)
- Sponsoring Organization:
- USDOE National Nuclear Security Administration (NNSA)
- DOE Contract Number:
- AC04-94AL85000
- OSTI ID:
- 1225849
- Report Number(s):
- SAND2015--9475R; 607873
- Country of Publication:
- United States
- Language:
- English
Similar Records
Machine Learning in the Big Data Era: Are We There Yet?
Scientific machine learning benchmarks
Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning
Conference
·
Tue Dec 31 23:00:00 EST 2013
·
OSTI ID:1265308
Scientific machine learning benchmarks
Journal Article
·
Tue Apr 05 20:00:00 EDT 2022
· Nature Reviews Physics
·
OSTI ID:1877481
Performance Prediction of Big Data Transfer Through Experimental Analysis and Machine Learning
Conference
·
Mon Jun 01 00:00:00 EDT 2020
·
OSTI ID:1648995