skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: A Knowledge Graph Approach for the Secondary Use of Cancer Registry Data

Conference ·
OSTI ID:1558464

Population-based central cancer registries collect valuable structured and unstructured data primarily for cancer surveillance and research, enhancing insights into clinical features associated with cancer occurrence, cancer treatment, and cancer outcomes to guide interventions which reduce the cancer burden. Cancer registries primarily collect data on (1) cancer type (case or tumor); (2) patient demographics such as age, gender, and residential address at time of diagnosis; (3) planned first course of treatment; and (4) date of last contact, vital status, and cause of death. Cancer registry data is dynamic, structured data, which is extracted from many unstructured sources such as electronic healthcare records, and consolidated for reporting and other purposes. While available advanced analytic tools such as SEER*Stat have the ability to build SAS queries, we, however, explore an innovative knowledge graph approach to organizing cancer registry data for advanced analytics and visualization, which has unique advantages over approaches of existing tools. This innovative knowledge graph approach semantically enriches the data and easily enables linkage with third-party data, which can better explain variation in outcomes. We have developed a prototype knowledge graph based on data from the Louisiana Tumor Registry and other publicly available datasets including Behavioral Risk Factor Surveillance System, Clinical Trials, DBpedia, GeoNames, Rural-Urban Continuum Codes, and Semantic MEDLINE. The resource description framework (RDF) data model was selected to represent our knowledge graph, which contains more than 25 billion triples and is ~4TB in storage size. To exhibit the benefits of the knowledge graph approach, we used scenario specific queries, which find the relationships between cancer treatment sequences and outcomes. To illustrate its ease of use in iterative analysis, the knowledge graph was linked to external datasets for performing complex queries across multiple datasets. In addition, we used knowledge graphs to identify data discrepancies and to handle schema changes. Finally, we visualized the knowledge graph to discover data patterns. Our results demonstrate this graph-based solution enables cancer researchers to execute complex queries and more easily perform iterative analyses to improve understanding of cancer registry data. In the future, we would like to use high-performance computing (HPC) resources for faster-generating hypotheses with clinical potential from our knowledge graph.

Research Organization:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
DOE Contract Number:
Resource Relation:
Conference: 2019 IEEE-EMBS International Conferences on Biomedical and Health Informatics (BHI 2019) - Chicago, Texas, United States of America - 5/19/2019 8:00:00 AM-5/22/2019 8:00:00 AM
Country of Publication:
United States

Similar Records

Knowledge Graph-Enabled Cancer Data Analytics
Journal Article · 2020 · IEEE Journal of Biomedical and Health Informatics · OSTI ID:1558464

A Reasoning And Hypothesis-Generation Framework Based On Scalable Graph Analytics
Conference · 2016 · OSTI ID:1558464

Privacy-Preserving Deep Learning NLP Models for Cancer Registries
Journal Article · 2021 · IEEE Transactions on Emerging Topics in Computing · OSTI ID:1558464

Related Subjects