AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing
Abstract
The application of data science in cancer research has been boosted by major advances in three primary areas: (1) Data: diversity, amount, and availability of biomedical data; (2) Advances in Artificial Intelligence (AI) and Machine Learning (ML) algorithms that enable learning from complex, large-scale data; and (3) Advances in computer architectures allowing unprecedented acceleration of simulation and machine learning algorithms. These advances help build in silico ML models that can provide transformative insights from data including: molecular dynamics simulations, next-generation sequencing, omics, imaging, and unstructured clinical text documents. Unique challenges persist, however, in building ML models related to cancer, including: (1) access, sharing, labeling, and integration of multimodal and multi-institutional data across different cancer types; (2) developing AI models for cancer research capable of scaling on next generation high performance computers; and (3) assessing robustness and reliability in the AI models. In this paper, we review the National Cancer Institute (NCI) -Department of Energy (DOE) collaboration, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C), a multi-institution collaborative effort focused on advancing computing and data technologies to accelerate cancer research on three levels: molecular, cellular, and population. This collaboration integrates various types of generated data, pre-exascale compute resources, and advancesmore »
- Authors:
-
- Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
- Argonne National Lab. (ANL), Lemont, IL (United States)
- National Cancer Inst., Bethesda, MD (United States)
- Frederick National Lab. of Cancer Research, Frederick, MD (United States)
- Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
- National Nuclear Security Administration (NNSA), Washington, DC (United States)
- Dept. of Energy (DOE), Washington DC (United States)
- Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States)
- Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
- Publication Date:
- Research Org.:
- Argonne National Laboratory (ANL), Argonne, IL (United States); Los Alamos National Laboratory (LANL), Los Alamos, NM (United States); Lawrence Livermore National Laboratory (LLNL), Livermore, CA (United States)
- Sponsoring Org.:
- USDOE National Nuclear Security Administration (NNSA); National Institutes of Health (NIH) - National Cancer Institute; USDOE Exascale Computing Project; USDOE Office of Science (SC)
- OSTI Identifier:
- 1637271
- Alternate Identifier(s):
- OSTI ID: 1657127; OSTI ID: 1821812
- Report Number(s):
- LA-UR-19-24131; LLNL-JRNL-773355
Journal ID: ISSN 2234-943X; 159254
- Grant/Contract Number:
- AC02-06CH11357; AC52-07NA27344; 89233218CNA000001
- Resource Type:
- Accepted Manuscript
- Journal Name:
- Frontiers in Oncology
- Additional Journal Information:
- Journal Volume: 9; Journal ID: ISSN 2234-943X
- Publisher:
- Frontiers Research Foundation
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; 60 APPLIED LIFE SCIENCES; artificial intelligence; cancer research; deep learning; high performance computing; multi-scale modeling; natural language processing; precision medicine; uncertainty quantification
Citation Formats
Bhattacharya, Tanmoy, Brettin, Thomas, Doroshow, James H., Evrard, Yvonne A., Greenspan, Emily J., Gryshuk, Amy L., Hoang, Thuc T., Lauzon, Carolyn B. Vea, Nissley, Dwight, Penberthy, Lynne, Stahlberg, Eric, Stevens, Rick, Streitz, Fred, Tourassi, Georgia, Xia, Fangfang, and Zaki, George. AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing. United States: N. p., 2019.
Web. doi:10.3389/fonc.2019.00984.
Bhattacharya, Tanmoy, Brettin, Thomas, Doroshow, James H., Evrard, Yvonne A., Greenspan, Emily J., Gryshuk, Amy L., Hoang, Thuc T., Lauzon, Carolyn B. Vea, Nissley, Dwight, Penberthy, Lynne, Stahlberg, Eric, Stevens, Rick, Streitz, Fred, Tourassi, Georgia, Xia, Fangfang, & Zaki, George. AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing. United States. https://doi.org/10.3389/fonc.2019.00984
Bhattacharya, Tanmoy, Brettin, Thomas, Doroshow, James H., Evrard, Yvonne A., Greenspan, Emily J., Gryshuk, Amy L., Hoang, Thuc T., Lauzon, Carolyn B. Vea, Nissley, Dwight, Penberthy, Lynne, Stahlberg, Eric, Stevens, Rick, Streitz, Fred, Tourassi, Georgia, Xia, Fangfang, and Zaki, George. Wed .
"AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing". United States. https://doi.org/10.3389/fonc.2019.00984. https://www.osti.gov/servlets/purl/1637271.
@article{osti_1637271,
title = {AI Meets Exascale Computing: Advancing Cancer Research With Large-Scale High Performance Computing},
author = {Bhattacharya, Tanmoy and Brettin, Thomas and Doroshow, James H. and Evrard, Yvonne A. and Greenspan, Emily J. and Gryshuk, Amy L. and Hoang, Thuc T. and Lauzon, Carolyn B. Vea and Nissley, Dwight and Penberthy, Lynne and Stahlberg, Eric and Stevens, Rick and Streitz, Fred and Tourassi, Georgia and Xia, Fangfang and Zaki, George},
abstractNote = {The application of data science in cancer research has been boosted by major advances in three primary areas: (1) Data: diversity, amount, and availability of biomedical data; (2) Advances in Artificial Intelligence (AI) and Machine Learning (ML) algorithms that enable learning from complex, large-scale data; and (3) Advances in computer architectures allowing unprecedented acceleration of simulation and machine learning algorithms. These advances help build in silico ML models that can provide transformative insights from data including: molecular dynamics simulations, next-generation sequencing, omics, imaging, and unstructured clinical text documents. Unique challenges persist, however, in building ML models related to cancer, including: (1) access, sharing, labeling, and integration of multimodal and multi-institutional data across different cancer types; (2) developing AI models for cancer research capable of scaling on next generation high performance computers; and (3) assessing robustness and reliability in the AI models. In this paper, we review the National Cancer Institute (NCI) -Department of Energy (DOE) collaboration, Joint Design of Advanced Computing Solutions for Cancer (JDACS4C), a multi-institution collaborative effort focused on advancing computing and data technologies to accelerate cancer research on three levels: molecular, cellular, and population. This collaboration integrates various types of generated data, pre-exascale compute resources, and advances in ML models to increase understanding of basic cancer biology, identify promising new treatment options, predict outcomes, and eventually prescribe specialized treatments for patients with cancer.},
doi = {10.3389/fonc.2019.00984},
journal = {Frontiers in Oncology},
number = ,
volume = 9,
place = {United States},
year = {Wed Oct 02 00:00:00 EDT 2019},
month = {Wed Oct 02 00:00:00 EDT 2019}
}
Web of Science
Figures / Tables:
Works referenced in this record:
Scalable deep text comprehension for Cancer surveillance on high-performance computing
journal, December 2018
- Qiu, John X.; Yoon, Hong-Jun; Srivastava, Kshitij
- BMC Bioinformatics, Vol. 19, Issue S18
The need for uncertainty quantification in machine-assisted medical decision making
journal, January 2019
- Begoli, Edmon; Bhattacharya, Tanmoy; Kusnezov, Dimitri
- Nature Machine Intelligence, Vol. 1, Issue 1
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles
journal, November 2017
- Subramanian, Aravind; Narayan, Rajiv; Corsello, Steven M.
- Cell, Vol. 171, Issue 6
A massively parallel infrastructure for adaptive multiscale simulations: modeling RAS initiation pathway for cancer
conference, November 2019
- Di Natale, Francesco; Bhatia, Harsh; Carpenter, Timothy S.
- SC '19: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
journal, March 2012
- Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas
- Nature, Vol. 483, Issue 7391
An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules
journal, August 2013
- Basu, Amrita; Bodycombe, Nicole E.; Cheah, Jaime H.
- Cell, Vol. 154, Issue 5
Retrofitting Word Embeddings with the UMLS Metathesaurus for Clinical Information Extraction
conference, December 2018
- Alawad, Mohammed; Hasan, S. M. Shamimul; Blair Christian, J.
- 2018 IEEE International Conference on Big Data (Big Data)
Methionine 170 is an Environmentally Sensitive Membrane Anchor in the Disordered HVR of K-Ras4B
journal, October 2018
- Neale, Chris; García, Angel E.
- The Journal of Physical Chemistry B, Vol. 122, Issue 44
Filter pruning of Convolutional Neural Networks for text classification: A case study of cancer pathology report comprehension
conference, March 2018
- Yoon, Hong-Jun; Robinson, Sarah; Christian, J. Blair
- 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells
journal, November 2012
- Yang, Wanjuan; Soares, Jorge; Greninger, Patricia
- Nucleic Acids Research, Vol. 41, Issue D1
Computational Lipidomics of the Neuronal Plasma Membrane
journal, November 2017
- Ingólfsson, Helgi I.; Carpenter, Timothy S.; Bhatia, Harsh
- Biophysical Journal, Vol. 113, Issue 10
Gene expression inference with deep learning
journal, February 2016
- Chen, Yifei; Li, Yi; Narayan, Rajiv
- Bioinformatics, Vol. 32, Issue 12
Capturing Phase Behavior of Ternary Lipid Mixtures with a Refined Martini Coarse-Grained Force Field
journal, September 2018
- Carpenter, Timothy S.; López, Cesar A.; Neale, Chris
- Journal of Chemical Theory and Computation, Vol. 14, Issue 11
A comprehensive transcriptional portrait of human cancer cell lines
journal, December 2014
- Klijn, Christiaan; Durinck, Steffen; Stawiski, Eric W.
- Nature Biotechnology, Vol. 33, Issue 3
Predicting tumor cell line response to drug pairs with deep learning
journal, December 2018
- Xia, Fangfang; Shukla, Maulik; Brettin, Thomas
- BMC Bioinformatics, Vol. 19, Issue S18
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research
journal, December 2018
- Wozniak, Justin M.; Jain, Rajeev; Balaprakash, Prasanna
- BMC Bioinformatics, Vol. 19, Issue S18
CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
journal, December 2018
- Hengartner, Nicolas; Cuellar, Leticia; Wu, Xiao-Cheng
- BMC Bioinformatics, Vol. 19, Issue S18
Hierarchical attention networks for information extraction from cancer pathology reports
journal, November 2017
- Gao, Shang; Young, Michael T.; Qiu, John X.
- Journal of the American Medical Informatics Association, Vol. 25, Issue 3
Introducing Heuristic Information into Ant Colony Optimization Algorithm for Identifying Epistasis
journal, January 2019
- Sun, Yingxia; Wang, Xuan; Shang, Junliang
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
The COXEN Principle: Translating Signatures of In vitro Chemosensitivity into Tools for Clinical Outcome Prediction and Drug Discovery in Cancer
journal, February 2010
- Smith, Steven C.; Baras, Alexander S.; Lee, Jae K.
- Cancer Research, Vol. 70, Issue 5
Deep learning-based transcriptome data classification for drug-target interaction prediction
journal, September 2018
- Xie, Lingwei; He, Song; Song, Xinyu
- BMC Genomics, Vol. 19, Issue S7
RAS Proteins and Their Regulators in Human Disease
journal, June 2017
- Simanshu, Dhirendra K.; Nissley, Dwight V.; McCormick, Frank
- Cell, Vol. 170, Issue 1
Molecular recognition of RAS/RAF complex at the membrane: Role of RAF cysteine-rich domain
journal, May 2018
- Travers, Timothy; López, Cesar A.; Van, Que N.
- Scientific Reports, Vol. 8, Issue 1
Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports
conference, March 2018
- Alawad, Mohammed; Yoon, Hong-Jun; Tourassi, Georgia D.
- 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Computational Lipidomics of the Neuronal Plasma Membrane
journal, November 2017
- Ingólfsson, Helgi I.; Carpenter, Timothy S.; Bhatia, Harsh
- Biophysical Journal, Vol. 113, Issue 10
RAS Proteins and Their Regulators in Human Disease
journal, June 2017
- Simanshu, Dhirendra K.; Nissley, Dwight V.; McCormick, Frank
- Cell, Vol. 170, Issue 1
A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles
journal, November 2017
- Subramanian, Aravind; Narayan, Rajiv; Corsello, Steven M.
- Cell, Vol. 171, Issue 6
Methionine 170 is an Environmentally Sensitive Membrane Anchor in the Disordered HVR of K-Ras4B
journal, October 2018
- Neale, Chris; García, Angel E.
- The Journal of Physical Chemistry B, Vol. 122, Issue 44
The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity
journal, March 2012
- Barretina, Jordi; Caponigro, Giordano; Stransky, Nicolas
- Nature, Vol. 483, Issue 7391
A comprehensive transcriptional portrait of human cancer cell lines
journal, December 2014
- Klijn, Christiaan; Durinck, Steffen; Stawiski, Eric W.
- Nature Biotechnology, Vol. 33, Issue 3
The need for uncertainty quantification in machine-assisted medical decision making
journal, January 2019
- Begoli, Edmon; Bhattacharya, Tanmoy; Kusnezov, Dimitri
- Nature Machine Intelligence, Vol. 1, Issue 1
Hierarchical attention networks for information extraction from cancer pathology reports
journal, November 2017
- Gao, Shang; Young, Michael T.; Qiu, John X.
- Journal of the American Medical Informatics Association, Vol. 25, Issue 3
Genomics of Drug Sensitivity in Cancer (GDSC): a resource for therapeutic biomarker discovery in cancer cells
journal, November 2012
- Yang, Wanjuan; Soares, Jorge; Greninger, Patricia
- Nucleic Acids Research, Vol. 41, Issue D1
Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports
journal, January 2018
- Qiu, John X.; Yoon, Hong-Jun; Fearn, Paul A.
- IEEE Journal of Biomedical and Health Informatics, Vol. 22, Issue 1
The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity
journal, April 2017
- Holbeck, Susan L.; Camalier, Richard; Crowell, James A.
- Cancer Research, Vol. 77, Issue 13
CAT: computer aided triage improving upon the Bayes risk through ε-refusal triage rules
journal, December 2018
- Hengartner, Nicolas; Cuellar, Leticia; Wu, Xiao-Cheng
- BMC Bioinformatics, Vol. 19, Issue S18
CANDLE/Supervisor: a workflow framework for machine learning applied to cancer research
journal, December 2018
- Wozniak, Justin M.; Jain, Rajeev; Balaprakash, Prasanna
- BMC Bioinformatics, Vol. 19, Issue S18
Predicting tumor cell line response to drug pairs with deep learning
journal, December 2018
- Xia, Fangfang; Shukla, Maulik; Brettin, Thomas
- BMC Bioinformatics, Vol. 19, Issue S18
Scalable deep text comprehension for Cancer surveillance on high-performance computing
journal, December 2018
- Qiu, John X.; Yoon, Hong-Jun; Srivastava, Kshitij
- BMC Bioinformatics, Vol. 19, Issue S18
Combating Label Noise in Deep Learning Using Abstention
preprint, January 2019
- Thulasidasan, Sunil; Bhattacharya, Tanmoy; Bilmes, Jeff
- arXiv