Summary: Future Generation Computer Systems 23 (2007) 485496
From bioinformatic web portals to semantically integrated Data Grid
Adriana Budura, Philippe Cudr´e-Mauroux, Karl Aberer
School Of Computer and Communication Sciences, EPFL Switzerland
Received 31 January 2006; accepted 26 March 2006
Available online 4 May 2006
We propose a semi-automated method for redeploying bioinformatic databases indexed in a Web portal as a decentralized, semantically
integrated and service-oriented Data Grid. We generate peer-to-peer schema mappings leveraging on cross-referenced instances and instance-
based schema matching algorithms. Analyzing real-world data extracted from an existing portal, we show how a rather trivial combination of
lexicographical measures with set distance measures yields surprisingly good results in practice. Finally, we propose data models for redeploying
all instances, schemas and schema mappings in the Data Grid, relying on standard Semantic Web technologies.
c 2006 Elsevier B.V. All rights reserved.
Keywords: Data sharing; Data mapping; Distributed databases; Semantics; Biology
In the past, biologists used to collect and analyze data
in isolation, creating proprietary schemas to annotate and
store their information in various ways. Today, with the