Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Ranking XPaths for extracting search result records Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra
 

Summary: Ranking XPaths for extracting search result records
Dolf Trieschnigg, Kien Tjin-Kam-Jet and Djoerd Hiemstra
University of Twente
Enschede, The Netherlands
{trieschn,tjinkamj,hiemstra}@cs.utwente.nl
ABSTRACT
Extracting search result records (SRRs) from webpages is
useful for building an aggregated search engine which com-
bines search results from a variety of search engines. Most
automatic approaches to search result extraction are not
portable: the complete process has to be rerun on a new
search result page. In this paper we describe an algorithm to
automatically determine XPath expressions to extract SRRs
from webpages. Based on a single search result page, an
XPath expression is determined which can be reused to ex-
tract SRRs from pages based on the same template. The
algorithm is evaluated on a six datasets, including two new
datasets containing a variety of web, image, video, shopping
and news search results. The evaluation shows that for 85%
of the tested search result pages, a useful XPath is deter-

  

Source: Al Hanbali, Ahmad - Department of Applied Mathematics, Universiteit Twente

 

Collections: Engineering