Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Textual Article Clustering in Newspaper Pages Marco Aiello & Andrea Pegoretti
 

Summary: Textual Article Clustering in Newspaper Pages
Marco Aiello & Andrea Pegoretti
Dep. of Information and Communication Technologies
Universit`a di Trento
Via Sommarive, 14
38100 Trento, Italy
aiellom@dit.unitn.it andpego@supereva.it
Abstract
In the analysis of a newspaper page an important step is the clustering of various text
blocks into logical units, i.e., into articles. We propose three algorithms based on text
processing techniques to cluster articles in newspaper pages. Based on the complexity of
the three algorithms and experiment on actual pages from the Italian newspaper L'Adige,
we select one of the algorithms as the preferred choice to solve the textual clustering
problem.
1 Introduction
One of the first and most evident consequences of the revolution brought about by the diffusion
of computers and by the great success of the Internet is a gradual decrease of the quantity of
paper documents in our everyday life: we can read our favorite newspaper on the Internet;
we can be informed minute by minute on the latest news by blogs and newsletters; we can
read a book in electronic format. Nevertheless, there is a body of paper documents still

  

Source: Aiello, Marco - Institute for Mathematics and Computing Science, Rijksuniversiteit Groningen

 

Collections: Computer Technologies and Information Sciences