Scalable deep text comprehension for Cancer surveillance on high-performance computing

Qiu, John X.; Yoon, Hong-Jun; Srivastava, Kshitij; Watson, Thomas; Christian, Blair; Ramanathan, Arvind; Wu, Xiao-Cheng; Fearn, Paul A.; Tourassi, Georgia

doi:10.1186/s12859-018-2511-9

Scalable deep text comprehension for Cancer surveillance on high-performance computing

Conference · Sat Dec 01 04:00:00 EST 2018

DOI:https://doi.org/10.1186/s12859-018-2511-9· OSTI ID:1491345

Qiu, John X. ^[1]; ^[1]; ^[1]; Watson, Thomas ^[2]; ^[1]; ^[1]; Wu, Xiao-Cheng ^[3]; Fearn, Paul A. ^[4]; ^[1]

ORNL
University of Memphis
LSUHSC-Louisiana Tumor Registry
National Cancer Institute, Bethesda, MD

Background: Deep Learning (DL) has advanced the state-of-the-art capabilities in bioinformatics applications which has resulted in trends of increasingly sophisticated and computationally demanding models trained by larger and larger data sets. This vastly increased computational demand challenges the feasibility of conducting cutting-edge research. One solution is to distribute the vast computational workload across multiple computing cluster nodes with data parallelism algorithms. In this study, we used a High-Performance Computing environment and implemented the Downpour Stochastic Gradient Descent algorithm for data parallelism to train a Convolutional Neural Network (CNN) for the natural language processing task of information extraction from a massive dataset of cancer pathology reports. We evaluated the scalability improvements using data parallelism training and the Titan supercomputer at Oak Ridge Leadership Computing Facility. To evaluate scalability, we used different numbers of worker nodes and performed a set of experiments comparing the effects of different training batch sizes and optimizer functions.Results: We found that Adadelta would consistently converge at a lower validation loss, though requiring over twice as many training epochs as the fastest converging optimizer, RMSProp. The Adam optimizer consistently achieved a close 2nd place minimum validation loss significantly faster; using a batch size of 16 and 32 allowed the network to converge in only 4.5 training epochs.Conclusions: We demonstrated that the networked training process is scalable across multiple compute nodes communicating with message passing interface while achieving higher classification accuracy compared to a traditional machine learning algorithm.

Research Organization:: Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)

Sponsoring Organization:: USDOE

DOE Contract Number:: AC05-00OR22725

OSTI ID:: 1491345

Country of Publication:: United States

Language:: English

References (9)

Large-Scale Machine Learning with Stochastic Gradient Descent Bottou, Léon Proceedings of COMPSTAT'2010 https://doi.org/10.1007/978-3-7908-2604-3_16	book	January 2010
American Society of Clinical Oncology 1998 Update of Recommended Breast Cancer Surveillance Guidelines Smith, Thomas J.; Davidson, Nancy E.; Schapira, David V. Journal of Clinical Oncology, Vol. 17, Issue 3 https://doi.org/10.1200/JCO.1999.17.3.1080	journal	March 1999
Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports Qiu, John X.; Yoon, Hong-Jun; Fearn, Paul A. IEEE Journal of Biomedical and Health Informatics, Vol. 22, Issue 1 https://doi.org/10.1109/JBHI.2017.2700722	journal	January 2018
Deep learning LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey Nature, Vol. 521, Issue 7553 https://doi.org/10.1038/nature14539	journal	May 2015
Using Natural Language Processing to Improve Efficiency of Manual Chart Abstraction in Research: The Case of Breast Cancer Recurrence Carrell, David S.; Halgrim, Scott; Tran, Diem-Thy American Journal of Epidemiology, Vol. 179, Issue 6 https://doi.org/10.1093/aje/kwt441	journal	January 2014
Distributed asynchronous deterministic and stochastic gradient optimization algorithms Tsitsiklis, J.; Bertsekas, D.; Athans, M. IEEE Transactions on Automatic Control, Vol. 31, Issue 9 https://doi.org/10.1109/TAC.1986.1104412	journal	September 1986
Accelerator: using data parallelism to program GPUs for general-purpose uses Tarditi, David; Puri, Sidd; Oglesby, Jose ACM SIGPLAN Notices, Vol. 41, Issue 11 https://doi.org/10.1145/1168918.1168898	journal	October 2006
Information Extraction: Past, Present and Future Piskorski, Jakub; Yangarber, Roman Multi-source, Multilingual Information Extraction and Summarization https://doi.org/10.1007/978-3-642-28569-1_2	book	July 2012
Cancer statistics, 2016: Cancer Statistics, 2016 Siegel, Rebecca L.; Miller, Kimberly D.; Jemal, Ahmedin CA: A Cancer Journal for Clinicians, Vol. 66, Issue 1 https://doi.org/10.3322/caac.21332	journal	January 2016

Similar Records

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

Journal Article · Tue Dec 29 19:00:00 EST 2020 · IEEE Transactions on Parallel and Distributed Systems · OSTI ID:1959404

Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent

Journal Article · Wed Oct 04 20:00:00 EDT 2023 · Communications on Applied Mathematics and Computation · OSTI ID:2007675

Scalable deep text comprehension for Cancer surveillance on high-performance computing

Citation Formats

References (9)

Similar Records

Related Subjects