Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

A Tool for Computing the Visual Similarity of Web Pages Maria Alpuente and Daniel Romero

Summary: A Tool for Computing the Visual Similarity of Web Pages
Mar´ia Alpuente and Daniel Romero
DSIC-ELP, Universidad Polit´ecnica de Valencia
Camino de Vera s/n, 46022 Valencia, Spain
Abstract--Recently, we proposed a functional technique for
identifying similar Web pages that is based on measuring tree
similarity. The key idea behind the method is to transform each
Web page into a compressed, normalized tree that effectively
represents its visual structure. In this work, we develop an
optimization of this technique that is based on memoization
and that achieves significant improvements in efficiency in
both time and space. This work also presents a tool that
implements the proposed technique as well as two case studies
for two real scenarios. Experiments on real documents show
that the optimized algorithm performs significantly better than
the original technique and demonstrate the practicality of our
Keywords-Web page comparison; visual similarity; tree edit
distance; Web document clustering.


Source: Alpuente, María - Departamento de Sistemas Informáticos y Computación, Universitat Politècnica de València


Collections: Computer Technologies and Information Sciences