Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Google's Deep-Web Crawl Jayant Madhavan David Ko Lucja Kot

Summary: Google's Deep-Web Crawl
Jayant Madhavan David Ko Lucja Kot

Google Inc. Google Inc. Cornell University
jayant@google.com dko@google.com lucja@cs.cornell.edu
Vignesh Ganapathy Alex Rasmussen

Alon Halevy
Google Inc. University of California, San Diego Google Inc.
vignesh@google.com arasmuss@cs.ucsd.edu halevy@google.com
The Deep Web, i.e., content hidden behind HTML forms,
has long been acknowledged as a significant gap in search
engine coverage. Since it represents a large portion of the
structured data on the Web, accessing Deep-Web content
has been a long-standing challenge for the database commu-
nity. This paper describes a system for surfacing Deep-Web
content, i.e., pre-computing submissions for each HTML
form and adding the resulting HTML pages into a search
engine index. The results of our surfacing have been incor-


Source: Anderson, Richard - Department of Computer Science and Engineering, University of Washington at Seattle
Love-Geffen, Tracy E.- Department of Psychology, University of California at San Diego
Maccabe, Barney - Department of Computer Science, University of New Mexico
Pregibon, Daryl - Google Labs


Collections: Biology and Medicine; Computer Technologies and Information Sciences; Mathematics