Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Structured-Content Extraction from the Web for Bibliographic Reference Generation
 

Summary: Structured-Content Extraction from the Web
for Bibliographic Reference Generation
Ramon Xuriguera
, Marta Arias

rxuriguera@lsi.upc.edu

marias@lsi.upc.edu
UPC Barcelona Tech
Abstract. In this paper we present a system that automatically creates bibli-
ographic indexes from a collection of PDF files by using the file contents to
search the Web and later extract the information from the resulting pages. We
pay special attention to the techniques used for extracting this data as well as the
automatic generation of extraction rules and their evaluation.
1 Introduction
Working on a research project surely implies spending vast amounts of time reading related
publications and the corresponding files, mostly in PDF format. Once the research is done, re-
searches have to generate bibliographic indexes from these articles, which can be a very tedious
and time-consuming task, even when using existing tools. We believe that this task could be
automatized to a large extent. In fact, the subject of this paper is a prototype application that

  

Source: Arias, Marta - Departament of Llenguatges i Sistemes Informátics, Universitat Politècnica de Catalunya

 

Collections: Computer Technologies and Information Sciences