DOE PAGES title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases

Abstract

Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mappingmore » between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.« less

Authors:
ORCiD logo [1];  [1];  [1];  [1]
  1. Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)
Publication Date:
Research Org.:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Org.:
USDOE
OSTI Identifier:
1454390
Grant/Contract Number:  
AC05-00OR22725
Resource Type:
Accepted Manuscript
Journal Name:
Journal of Healthcare Informatics Research
Additional Journal Information:
Journal Volume: 1; Journal Issue: 2; Journal ID: ISSN 2509-4971
Publisher:
Springer
Country of Publication:
United States
Language:
English
Subject:
60 APPLIED LIFE SCIENCES; 96 KNOWLEDGE MANAGEMENT AND PRESERVATION; Computational epidemiology; Knowledge base; Social contact networks; Mapping; RDF; SPARQL

Citation Formats

Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, and Marathe, Madhav V. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases. United States: N. p., 2017. Web. doi:10.1007/s41666-017-0010-9.
Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, & Marathe, Madhav V. EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases. United States. https://doi.org/10.1007/s41666-017-0010-9
Hasan, S. M. Shamimul, Fox, Edward A., Bisset, Keith, and Marathe, Madhav V. Mon . "EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases". United States. https://doi.org/10.1007/s41666-017-0010-9. https://www.osti.gov/servlets/purl/1454390.
@article{osti_1454390,
title = {EpiK: A Knowledge Base for Epidemiological Modeling and Analytics of Infectious Diseases},
author = {Hasan, S. M. Shamimul and Fox, Edward A. and Bisset, Keith and Marathe, Madhav V.},
abstractNote = {Computational epidemiology seeks to develop computational methods to study the distribution and determinants of health-related states or events (including disease), and the application of this study to the control of diseases and other health problems. Recent advances in computing and data sciences have led to the development of innovative modeling environments to support this important goal. The datasets used to drive the dynamic models as well as the data produced by these models presents unique challenges owing to their size, heterogeneity and diversity. These datasets form the basis of effective and easy to use decision support and analytical environments. As a result, it is important to develop scalable data management systems to store, manage and integrate these datasets. In this paper, we develop EpiK—a knowledge base that facilitates the development of decision support and analytical environments to support epidemic science. An important goal is to develop a framework that links the input as well as output datasets to facilitate effective spatio-temporal and social reasoning that is critical in planning and intervention analysis before and during an epidemic. The data management framework links modeling workflow data and its metadata using a controlled vocabulary. The metadata captures information about storage, the mapping between the linked model and the physical layout, and relationships to support services. EpiK is designed to support agent-based modeling and analytics frameworks—aggregate models can be seen as special cases and are thus supported. We use semantic web technologies to create a representation of the datasets that encapsulates both the location and the schema heterogeneity. The choice of RDF as a representation language is motivated by the diversity and growth of the datasets that need to be integrated. A query bank is developed—the queries capture a broad range of questions that can be posed and answered during a typical case study pertaining to disease outbreaks. The queries are constructed using SPARQL Protocol and RDF Query Language (SPARQL) over the EpiK. EpiK can hide schema and location heterogeneity while efficiently supporting queries that span the computational epidemiology modeling pipeline: from model construction to simulation output. As a result, we show that the performance of benchmark queries varies significantly with respect to the choice of hardware underlying the database and resource description framework (RDF) engine.},
doi = {10.1007/s41666-017-0010-9},
journal = {Journal of Healthcare Informatics Research},
number = 2,
volume = 1,
place = {United States},
year = {Mon Nov 06 00:00:00 EST 2017},
month = {Mon Nov 06 00:00:00 EST 2017}
}

Works referenced in this record:

Faceted search over RDF-based knowledge graphs
journal, March 2016


From Relational Data to RDFS Models
book, January 2004


Development of Web-Based Epidemiological Reporting System for Tasmania Utilizing a Google Maps Add-On
conference, December 2007

  • Shi, Hao; Zhang, Yanchun; Zhang, Jingyuan
  • 9th Biennial Conference of the Australian Pattern Recognition Society on Digital Image Computing Techniques and Applications (DICTA 2007)
  • DOI: 10.1109/DICTA.2007.4426785

Publishing life science data as linked open data: the case study of miRBase
conference, January 2012

  • Dalamagas, Theodore; Bikakis, Nikos; Papastefanatos, George
  • Proceedings of the First International Workshop on Open Data - WOD '12
  • DOI: 10.1145/2422604.2422615

Updating relational data via SPARQL/update
conference, January 2010

  • Hert, Matthias; Reif, Gerald; Gall, Harald C.
  • Proceedings of the 1st International Workshop on Data Semantics - DataSem '10
  • DOI: 10.1145/1754239.1754266

Epidemic Marketplace: An Information Management System for Epidemiological Data
book, January 2010

  • Lopes, Luis F.; Silva, Fabrício A. B.; Couto, Francisco
  • Information Technology in Bio- and Medical Informatics, ITBAM 2010
  • DOI: 10.1007/978-3-642-15020-3_3

Modelling to contain pandemics
journal, August 2009


Using Semantic Technology to Tame the Data Variety Challenge
journal, November 2016

  • Horrocks, Ian; Giese, Martin; Kharlamov, Evgeny
  • IEEE Internet Computing, Vol. 20, Issue 6
  • DOI: 10.1109/MIC.2016.121

Semantic Robot Memory Store using 5W1H for Service Tasks [Semantic Robot Memory Store using 5W1H for Service Tasks]
journal, January 2010

  • Kim, Hak Soo; Son, Jin Hyun; Lim, Gi Hyun
  • The Abstracts of the international conference on advanced mechatronics : toward evolutionary fusion of IT and mechatronics : ICAM, Vol. 2010.5, Issue 0
  • DOI: 10.1299/jsmeicam.2010.5.579

On directly mapping relational databases to RDF and OWL
conference, January 2012

  • Sequeda, Juan F.; Arenas, Marcelo; Miranker, Daniel P.
  • Proceedings of the 21st international conference on World Wide Web - WWW '12
  • DOI: 10.1145/2187836.2187924

GeMInA, Genomic Metadata for Infectious Agents, a geospatial surveillance pathogen database
journal, October 2009

  • Schriml, L. M.; Arze, C.; Nadendla, S.
  • Nucleic Acids Research, Vol. 38, Issue Database
  • DOI: 10.1093/nar/gkp832

The EBI RDF platform: linked open data for the life sciences
journal, January 2014


Semantic Technologies for Data Analysis in Health Care
book, January 2016


Drowning in data: digital library architecture to support scientific use of embedded sensor networks
conference, January 2007

  • Borgman, Christine L.; Wallis, Jillian C.; Mayernik, Matthew S.
  • Proceedings of the 2007 conference on Digital libraries - JCDL '07
  • DOI: 10.1145/1255175.1255228

How Much Would Closing Schools Reduce Transmission During an Influenza Pandemic?
journal, January 2007


EpiFast: a fast algorithm for large scale realistic epidemic simulations on distributed memory systems
conference, January 2009

  • Bisset, Keith R.; Chen, Jiangzhuo; Feng, Xizhou
  • Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09
  • DOI: 10.1145/1542275.1542336

A comparison of RDB-to-RDF mapping languages
conference, January 2011

  • Hert, Matthias; Reif, Gerald; Gall, Harald C.
  • Proceedings of the 7th International Conference on Semantic Systems - I-Semantics '11
  • DOI: 10.1145/2063518.2063522

A systematic review of studies on forecasting the dynamics of influenza outbreaks
journal, December 2013

  • Nsoesie, Elaine O.; Brownstein, John S.; Ramakrishnan, Naren
  • Influenza and Other Respiratory Viruses, Vol. 8, Issue 3
  • DOI: 10.1111/irv.12226

Exploring nationally and regionally defined models for large area population mapping
journal, October 2014


Emergency response to a smallpox attack: The case for mass vaccination
journal, July 2002

  • Kaplan, E. H.; Craft, D. L.; Wein, L. M.
  • Proceedings of the National Academy of Sciences, Vol. 99, Issue 16
  • DOI: 10.1073/pnas.162282799

ISIS: a networked-epidemiology based pervasive web app for infectious disease pandemic planning and response
conference, January 2014

  • Beckman, Richard; Bisset, Keith R.; Chen, Jiangzhuo
  • Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14
  • DOI: 10.1145/2623330.2623375

Progress and Challenges in Infectious Disease Cartography
journal, January 2016

  • Kraemer, Moritz U. G.; Hay, Simon I.; Pigott, David M.
  • Trends in Parasitology, Vol. 32, Issue 1
  • DOI: 10.1016/j.pt.2015.09.006

Digital Epidemiology
journal, July 2012


Genre taxonomy: A knowledge repository of communicative actions
journal, October 2001

  • Yoshioka, Takeshi; Herman, George; Yates, JoAnne
  • ACM Transactions on Information Systems, Vol. 19, Issue 4
  • DOI: 10.1145/502795.502798

Networks and epidemic models
journal, May 2005

  • Keeling, Matt J.; Eames, Ken T. D.
  • Journal of The Royal Society Interface, Vol. 2, Issue 4
  • DOI: 10.1098/rsif.2005.0051

A process-oriented scientific database model
journal, September 1992


Relational Databases in RDF: Keys and Foreign Keys
book, January 2008


Computational epidemiology
journal, July 2013

  • Marathe, Madhav; Vullikanti, Anil Kumar S.
  • Communications of the ACM, Vol. 56, Issue 7
  • DOI: 10.1145/2483852.2483871

The Mathematics of Infectious Diseases
journal, January 2000


Dimensions of superspreading
journal, November 2005

  • Galvani, Alison P.; May, Robert M.
  • Nature, Vol. 438, Issue 7066
  • DOI: 10.1038/438293a

Model-Based Comprehensive Analysis of School Closure Policies for Mitigating Influenza Epidemics and Pandemics
journal, January 2016


Building an efficient RDF store over a relational database
conference, January 2013

  • Bornea, Mihaela A.; Dolby, Julian; Kementsietsidis, Anastasios
  • Proceedings of the 2013 international conference on Management of data - SIGMOD '13
  • DOI: 10.1145/2463676.2463718

BioPortal: ontologies and integrated data resources at the click of a mouse
journal, May 2009

  • Noy, N. F.; Shah, N. H.; Whetzel, P. L.
  • Nucleic Acids Research, Vol. 37, Issue Web Server
  • DOI: 10.1093/nar/gkp440

Data modeling of scientific experimentation
conference, January 1995

  • Pratt, J. Michael
  • Proceedings of the 1995 ACM symposium on Applied computing - SAC '95
  • DOI: 10.1145/315891.315913

Interpreting relational databases in the RDF domain
conference, January 2011

  • Bertails, Alexandre; Prud'hommeaux, Eric Gordon
  • Proceedings of the sixth international conference on Knowledge capture - K-CAP '11
  • DOI: 10.1145/1999676.1999699

Forecasting Seasonal Influenza Fusing Digital Indicators and a Mechanistic Disease Model
conference, January 2017

  • Zhang, Qian; Perra, Nicola; Perrotta, Daniela
  • Proceedings of the 26th International Conference on World Wide Web - WWW '17
  • DOI: 10.1145/3038912.3052678

RDB2RDF plugin: relational databases to RDF plugin for eclipse
conference, January 2011

  • Salas, Percy E.; Marx, Edgard; Mera, Alexander
  • Proceeding of the 1st workshop on Developing tools as plug-ins - TOPI '11
  • DOI: 10.1145/1984708.1984717

Accessing and Documenting Relational Databases through OWL Ontologies
book, January 2009


Opinion: Mathematical models: A key tool for outbreak response
journal, December 2014

  • Lofgren, Eric T.; Halloran, M. Elizabeth; Rivers, Caitlin M.
  • Proceedings of the National Academy of Sciences, Vol. 111, Issue 51
  • DOI: 10.1073/pnas.1421551111

Towards linked open gene mutations data
journal, January 2012


Forecasting a Moving Target: Ensemble Models for ILI Case Count Predictions
conference, April 2014

  • Chakraborty, Prithwish; Khadivi, Pejman; Lewis, Bryan
  • Proceedings of the 2014 SIAM International Conference on Data Mining
  • DOI: 10.1137/1.9781611973440.30

Efficient processing of SPARQL joins in memory by dynamically restricting triple patterns
conference, January 2009

  • Groppe, Jinghua; Groppe, Sven; Ebers, Sebastian
  • Proceedings of the 2009 ACM symposium on Applied Computing - SAC '09
  • DOI: 10.1145/1529282.1529560

A Scalable Data Management Tool to Support Epidemiological Modeling of Large Urban Regions
book, January 2007

  • Barrett, Christopher L.; Bisset, Keith; Eubank, Stephen
  • Research and Advanced Technology for Digital Libraries
  • DOI: 10.1007/978-3-540-74851-9_65

Data mapping framework in a digital library with computational epidemiology datasets
conference, September 2014

  • Hasan, S. M. Shamimul; Gupta, Sandeep; Fox, Edward A.
  • 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
  • DOI: 10.1109/JCDL.2014.6970219

Contact network epidemiology: Bond percolation applied to infectious disease prediction and control
journal, October 2006