skip to main content
OSTI.GOV title logo U.S. Department of Energy
Office of Scientific and Technical Information

Title: Scenario driven data modelling: a method for integrating diverse sources of data and data streams

Journal Article · · BMC Bioinformatics
 [1];  [1];  [1];  [1]
  1. Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Biosciences Division

Background: Biology is rapidly becoming a data intensive, data-driven science. It is essential that data is represented and connected in ways that best represent its full conceptual content and allows both automated integration and data driven decision-making. Recent advancements in distributed multi-relational directed graphs, implemented in the form of the Semantic Web make it possible to deal with complicated heterogeneous data in new and interesting ways. Results: This paper presents a new approach, scenario driven data modelling (SDDM), that integrates multirelational directed graphs with data streams. SDDM can be applied to virtually any data integration challenge with widely divergent types of data and data streams. In this work, we explored integrating genetics data with reports from traditional media. SDDM was applied to the New Delhi metallo-beta-lactamase gene (NDM-1), an emerging global health threat. The SDDM process constructed a scenario, created a RDF multi-relational directed graph that linked diverse types of data to the Semantic Web, implemented RDF conversion tools (RDFizers) to bring content into the Sematic Web, identified data streams and analytical routines to analyse those streams, and identified user requirements and graph traversals to meet end-user requirements. Conclusions: We provided an example where SDDM was applied to a complex data integration challenge. The process created a model of the emerging NDM-1 health threat, identified and filled gaps in that model, and constructed reliable software that monitored data streams based on the scenario derived multi-relational directed graph. The SDDM process significantly reduced the software requirements phase by letting the scenario and resulting multi-relational directed graph define what is possible and then set the scope of the user requirements. Approaches like SDDM will be critical to the future of data intensive, data-driven science because they automate the process of converting massive data streams into usable knowledge.

Research Organization:
Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States)
Sponsoring Organization:
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
Grant/Contract Number:
AC05-00OR22725
OSTI ID:
1626286
Journal Information:
BMC Bioinformatics, Vol. 12, Issue Suppl 10; ISSN 1471-2105
Publisher:
BioMed CentralCopyright Statement
Country of Publication:
United States
Language:
English

References (19)

Detecting influenza epidemics using search engine query data journal February 2009
Dietary palmitic acid promotes a prometastatic memory via Schwann cells journal November 2021
New Delhi metallo-beta-lactamase (NDM-1): towards a new pandemia? journal December 2010
Gene Ontology: tool for the unification of biology journal May 2000
EpiSimdemics: An efficient algorithm for simulating the spread of infectious disease over large realistic social networks
  • Barrett, Christopher L.; Bisset, Keith R.; Eubank, Stephen G.
  • 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis https://doi.org/10.1109/sc.2008.5214892
conference November 2008
Profile hidden Markov models journal October 1998
Characterization of a New Metallo- -Lactamase Gene, blaNDM-1, and a Novel Erythromycin Esterase Gene Carried on a Unique Genetic Structure in Klebsiella pneumoniae Sequence Type 14 from India journal September 2009
Global spread of New Delhi metallo-β-lactamase 1 journal December 2010
Integrating biological databases journal May 2003
A Chado case study: an ontology-based modular schema for representing genome-associated biological information journal July 2007
Bio2RDF: Towards a mashup to build bioinformatics knowledge systems journal October 2008
MvirDB--a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications journal January 2007
SuperToxic: a comprehensive database of toxic compounds journal January 2009
The Semantic Web journal May 2001
ARDB--Antibiotic Resistance Genes Database journal January 2009
Emergence of a new antibiotic resistance mechanism in India, Pakistan, and the UK: a molecular, biological, and epidemiological study journal September 2010
ProMED-mail: An Early Warning System for Emerging Diseases journal July 2004
PIG--the pathogen interaction gateway journal January 2009
The Emerging Web of Linked Data journal September 2009

Cited By (1)

Proceedings of the 2012 MidSouth computational biology and bioinformatics society (MCBIOS) conference journal September 2012