| | |
Summary: Jedi: Extracting and Synthesizing Information from the Web
Gerald Huck, Peter Fankhauser, Karl Aberer, Erich Neuhold
GMD - German National Research Center for Information Technology
Integrated Publication and Information Systems Institute IPSI
Dolivostr. 15, 64293 Darmstadt, Germany
{huck, fankhaus, aberer, neuhold}@darmstadt.gmd.de
Abstract
Jedi (Java based Extraction and Dissemination of Informa-
tion) is a lightweight tool for the creation of wrappers and
mediators to extract, combine, and reconcile information
from several independent information sources. For wrap-
pers it uses attributed grammars, which are evaluated with
a fault-tolerant parsing strategy to cope with ambiguous
grammars and irregular sources. For mediation it uses a
simple generic object-model that can be extended with
Java-libraries for specific models such as HTML, XML or
the relational model. This paper describes the architecture
of Jedi, and then focuses on Jedi's wrapper generator.
1. Introduction
The World Wide Web has evolved into a general pur-
|