skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

Abstract

Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mappingmore » between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

Authors:
ORCiD logo [1];  [1];  [1];  [1]
  1. Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Publication Date:
Research Org.:
Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1454390
Grant/Contract Number:
AC05-00OR22725
Resource Type:
Journal Article: Accepted Manuscript
Journal Name:
Journal of Healthcare Informatics Research
Additional Journal Information:
Journal Volume: 1; Journal Issue: 2; Journal ID: ISSN 2509-4971
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Computational epidemiology; Knowledge base; Social contact networks; Mapping; RDF; SPARQL

Citation Formats

Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, and Marathe, Madhav V.. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases. United States: N. p., 2017. Web. doi:10.1007/s41666-017-0010-9.
Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, & Marathe, Madhav V.. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases. United States. doi:10.1007/s41666-017-0010-9.
Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, and Marathe, Madhav V.. Mon . "EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases". United States. doi:10.1007/s41666-017-0010-9.
@article{osti_1454390,
title = {EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases},
author = {Hasan, S. M. Shamimul and Fox, Edward A. and Bisset, Keith and Marathe, Madhav V.},
abstractNote = {Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.},
doi = {10.1007/s41666-017-0010-9},
journal = {Journal of Healthcare Informatics Research},
number = 2,
volume = 1,
place = {United States},
year = {Mon Nov 06 00:00:00 EST 2017},
month = {Mon Nov 06 00:00:00 EST 2017}
}

Journal Article:
Free Publicly Available Full Text
This content will become publicly available on November 6, 2018
Publisher's Version of Record

Save / Share: