skip to main content
DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research

Abstract

BackgroundCurrent multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines.ResultsThis paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks.ConclusionsInitial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.

Authors:
 [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [1];  [2];  [2];  [3];  [4]
  1. Argonne National Lab. (ANL), Argonne, IL (United States)
  2. Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
  3. Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
  4. Minerva, San Francisco, CA (United States)
Publication Date:
Research Org.:
Argonne National Lab. (ANL), Argonne, IL (United States)
Sponsoring Org.:
USDOE Office of Science (SC); National Institutes of Health (NIH)
OSTI Identifier:
1510031
Grant/Contract Number:  
AC02-06CH11357
Resource Type:
Accepted Manuscript
Journal Name:
BMC Bioinformatics
Additional Journal Information:
Journal Volume: 19; Journal Issue: S18; Journal ID: ISSN 1471-2105
Publisher:
BioMed Central
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES

Citation Formats

Wozniak, Justin M., Jain, Rajeev, Balaprakash, Prasanna, Ozik, Jonathan, Collier, Nicholson T., Bauer, John, Xia, Fangfang, Brettin, Thomas, Stevens, Rick, Mohd-Yusof, Jamaludin, Cardona, Cristina Garcia, Essen, Brian Van, and Baughman, Matthew. CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research. United States: N. p., 2018. Web. https://doi.org/10.1186/s12859-018-2508-4.
Wozniak, Justin M., Jain, Rajeev, Balaprakash, Prasanna, Ozik, Jonathan, Collier, Nicholson T., Bauer, John, Xia, Fangfang, Brettin, Thomas, Stevens, Rick, Mohd-Yusof, Jamaludin, Cardona, Cristina Garcia, Essen, Brian Van, & Baughman, Matthew. CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research. United States. https://doi.org/10.1186/s12859-018-2508-4
Wozniak, Justin M., Jain, Rajeev, Balaprakash, Prasanna, Ozik, Jonathan, Collier, Nicholson T., Bauer, John, Xia, Fangfang, Brettin, Thomas, Stevens, Rick, Mohd-Yusof, Jamaludin, Cardona, Cristina Garcia, Essen, Brian Van, and Baughman, Matthew. Fri . "CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research". United States. https://doi.org/10.1186/s12859-018-2508-4. https://www.osti.gov/servlets/purl/1510031.
@article{osti_1510031,
title = {CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research},
author = {Wozniak, Justin M. and Jain, Rajeev and Balaprakash, Prasanna and Ozik, Jonathan and Collier, Nicholson T. and Bauer, John and Xia, Fangfang and Brettin, Thomas and Stevens, Rick and Mohd-Yusof, Jamaludin and Cardona, Cristina Garcia and Essen, Brian Van and Baughman, Matthew},
abstractNote = {BackgroundCurrent multi-petaflop supercomputers are powerful systems, but present challenges when faced with problems requiring large machine learning workflows. Complex algorithms running at system scale, often with different patterns that require disparate software packages and complex data flows cause difficulties in assembling and managing large experiments on these machines.ResultsThis paper presents a workflow system that makes progress on scaling machine learning ensembles, specifically in this first release, ensembles of deep neural networks that address problems in cancer research across the atomistic, molecular and population scales. The initial release of the application framework that we call CANDLE/Supervisor addresses the problem of hyper-parameter exploration of deep neural networks.ConclusionsInitial results demonstrating CANDLE on DOE systems at ORNL, ANL and NERSC (Titan, Theta and Cori, respectively) demonstrate both scaling and multi-platform execution.},
doi = {10.1186/s12859-018-2508-4},
journal = {BMC Bioinformatics},
number = S18,
volume = 19,
place = {United States},
year = {2018},
month = {12}
}

Journal Article:
Free Publicly Available Full Text
Publisher's Version of Record

Citation Metrics:
Cited by: 5 works
Citation information provided by
Web of Science

Figures / Tables:

Figure 1 Figure 1: CANDLE/Supervisor overall architecture.

Save / Share:

Works referenced in this record:

A community effort to assess and improve drug sensitivity prediction algorithms
journal, June 2014

  • Costello, James C.; Heiser, Laura M.; Georgii, Elisabeth
  • Nature Biotechnology, Vol. 32, Issue 12
  • DOI: 10.1038/nbt.2877

Hyperopt: a Python library for model selection and hyperparameter optimization
journal, January 2015


Compiler Techniques for Massively Scalable Implicit Task Parallelism
conference, November 2014

  • Armstrong, Timothy G.; Wozniak, Justin M.; Wilde, Michael
  • SC14: International Conference for High Performance Computing, Networking, Storage and Analysis
  • DOI: 10.1109/SC.2014.30

LBANN: livermore big artificial neural network HPC toolkit
conference, January 2015

  • Van Essen, Brian; Kim, Hyojin; Pearce, Roger
  • Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC '15
  • DOI: 10.1145/2834892.2834897

Deep learning
journal, May 2015

  • LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
  • Nature, Vol. 521, Issue 7553
  • DOI: 10.1038/nature14539

Random Forests
journal, January 2001


Evolving Neural Networks through Augmenting Topologies
journal, June 2002


Caffe: Convolutional Architecture for Fast Feature Embedding
conference, January 2014

  • Jia, Yangqing; Shelhamer, Evan; Donahue, Jeff
  • Proceedings of the ACM International Conference on Multimedia - MM '14
  • DOI: 10.1145/2647868.2654889

From desktop to Large-Scale Model Exploration with Swift/T
conference, December 2016

  • Ozik, Jonathan; Collier, Nicholson T.; Wozniak, Justin M.
  • 2016 Winter Simulation Conference (WSC)
  • DOI: 10.1109/WSC.2016.7822090

Swift/T: scalable data flow programming for many-task applications
journal, August 2013

  • Wozniak, Justin M.; Armstrong, Timothy G.; Wilde, Michael
  • ACM SIGPLAN Notices, Vol. 48, Issue 8
  • DOI: 10.1145/2517327.2442559

    Works referencing / citing this record:

    Development of training environment for deep learning with medical images on supercomputer system based on asynchronous parallel Bayesian optimization
    journal, January 2020

    • Nomura, Yukihiro; Sato, Issei; Hanawa, Toshihiro
    • The Journal of Supercomputing, Vol. 76, Issue 9
    • DOI: 10.1007/s11227-020-03164-7

      Figures/Tables have been extracted from DOE-funded journal article accepted manuscripts.