skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data

Abstract

Population-based central cancer registries collect valuable structured and unstructured data primarily for cancer surveillance and research, enhancing insights into clinical features associated with cancer occurrence, cancer treatment, and cancer outcomes to guide interventions which reduce the cancer burden. Cancer registries primarily collect data on (1) cancer type (case or tumor); (2) patient demographics such as age, gender, and residential address at time of diagnosis; (3) planned first course of treatment; and (4) date of last contact, vital status, and cause of death. Cancer registry data is dynamic, structured data, which is extracted from many unstructured sources such as electronic healthcare records, and consolidated for reporting and other purposes. While available advanced analytic tools such as SEER*Stat have the ability to build SAS queries, we, however, explore an innovative knowledge graph approach to organizing cancer registry data for advanced analytics and visualization, which has unique advantages over approaches of existing tools. This innovative knowledge graph approach semantically enriches the data and easily enables linkage with third-party data, which can better explain variation in outcomes. We have developed a prototype knowledge graph based on data from the Louisiana Tumor Registry and other publicly available datasets including Behavioral Risk Factor Surveillance System, Clinicalmore » Trials, DBpedia, GeoNames, Rural-Urban Continuum Codes, and Semantic MEDLINE. The resource description framework (RDF) data model was selected to represent our knowledge graph, which contains more than 25 billion triples and is ~4TB in storage size. To exhibit the benefits of the knowledge graph approach, we used scenario specific queries, which find the relationships between cancer treatment sequences and outcomes. To illustrate its ease of use in iterative analysis, the knowledge graph was linked to external datasets for performing complex queries across multiple datasets. In addition, we used knowledge graphs to identify data discrepancies and to handle schema changes. Finally, we visualized the knowledge graph to discover data patterns. Our results demonstrate this graph-based solution enables cancer researchers to execute complex queries and more easily perform iterative analyses to improve understanding of cancer registry data. In the future, we would like to use high-performance computing (HPC) resources for faster-generating hypotheses with clinical potential from our knowledge graph.« less

Authors:
 [1];  [2];  [3]; ORCiD logo [1]; ORCiD logo [1]
  1. ORNL
  2. National Cancer Institute, Bethesda, MD
  3. LSUHSC-Louisiana Tumor Registry
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1558464
DOE Contract Number:  
AC05-00OR22725
Resource Type:
Conference
Resource Relation:
Conference: 2019 IEEE-EMBS International Conferences on Biomedical and Health Informatics (BHI 2019) - Chicago, Texas, United States of America - 5/19/2019 8:00:00 AM-5/22/2019 8:00:00 AM
Country of Publication:
United States
Language:
English

Citation Formats

Hasan, S M Shamimul, Rivera, Donna R., Wu, Xiao-Cheng, Christian, Blair, and Tourassi, Georgia. A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data. United States: N. p., 2019. Web.
Hasan, S M Shamimul, Rivera, Donna R., Wu, Xiao-Cheng, Christian, Blair, & Tourassi, Georgia. A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data. United States.
Hasan, S M Shamimul, Rivera, Donna R., Wu, Xiao-Cheng, Christian, Blair, and Tourassi, Georgia. 2019. "A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data". United States. https://www.osti.gov/servlets/purl/1558464.
@article{osti_1558464,
title = {A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data},
author = {Hasan, S M Shamimul and Rivera, Donna R. and Wu, Xiao-Cheng and Christian, Blair and Tourassi, Georgia},
abstractNote = {Population-based central cancer registries collect valuable structured and unstructured data primarily for cancer surveillance and research, enhancing insights into clinical features associated with cancer occurrence, cancer treatment, and cancer outcomes to guide interventions which reduce the cancer burden. Cancer registries primarily collect data on (1) cancer type (case or tumor); (2) patient demographics such as age, gender, and residential address at time of diagnosis; (3) planned first course of treatment; and (4) date of last contact, vital status, and cause of death. Cancer registry data is dynamic, structured data, which is extracted from many unstructured sources such as electronic healthcare records, and consolidated for reporting and other purposes. While available advanced analytic tools such as SEER*Stat have the ability to build SAS queries, we, however, explore an innovative knowledge graph approach to organizing cancer registry data for advanced analytics and visualization, which has unique advantages over approaches of existing tools. This innovative knowledge graph approach semantically enriches the data and easily enables linkage with third-party data, which can better explain variation in outcomes. We have developed a prototype knowledge graph based on data from the Louisiana Tumor Registry and other publicly available datasets including Behavioral Risk Factor Surveillance System, Clinical Trials, DBpedia, GeoNames, Rural-Urban Continuum Codes, and Semantic MEDLINE. The resource description framework (RDF) data model was selected to represent our knowledge graph, which contains more than 25 billion triples and is ~4TB in storage size. To exhibit the benefits of the knowledge graph approach, we used scenario specific queries, which find the relationships between cancer treatment sequences and outcomes. To illustrate its ease of use in iterative analysis, the knowledge graph was linked to external datasets for performing complex queries across multiple datasets. In addition, we used knowledge graphs to identify data discrepancies and to handle schema changes. Finally, we visualized the knowledge graph to discover data patterns. Our results demonstrate this graph-based solution enables cancer researchers to execute complex queries and more easily perform iterative analyses to improve understanding of cancer registry data. In the future, we would like to use high-performance computing (HPC) resources for faster-generating hypotheses with clinical potential from our knowledge graph.},
doi = {},
url = {https://www.osti.gov/biblio/1558464}, journal = {},
number = ,
volume = ,
place = {United States},
year = {2019},
month = {5}
}

Conference:
Other availability
Please see Document Availability for additional information on obtaining the full-text document. Library patrons may search WorldCat to identify libraries that hold this conference proceeding.

Save / Share: