Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Textual Article Clustering in Newspaper Pages Marco Aiello & Andrea Pegoretti

Summary: Textual Article Clustering in Newspaper Pages
Marco Aiello & Andrea Pegoretti
Dep. of Information and Communication Technologies
Universit`a di Trento
Via Sommarive, 14
38100 Trento, Italy
aiellom@dit.unitn.it andpego@supereva.it
In the analysis of a newspaper page an important step is the clustering of various text
blocks into logical units, i.e., into articles. We propose three algorithms based on text
processing techniques to cluster articles in newspaper pages. Based on the complexity of
the three algorithms and experiment on actual pages from the Italian newspaper L'Adige,
we select one of the algorithms as the preferred choice to solve the textual clustering
1 Introduction
One of the first and most evident consequences of the revolution brought about by the diffusion
of computers and by the great success of the Internet is a gradual decrease of the quantity of
paper documents in our everyday life: we can read our favorite newspaper on the Internet;
we can be informed minute by minute on the latest news by blogs and newsletters; we can
read a book in electronic format. Nevertheless, there is a body of paper documents still


Source: Aiello, Marco - Institute for Mathematics and Computing Science, Rijksuniversiteit Groningen


Collections: Computer Technologies and Information Sciences