| | |
Summary: Google's Deep-Web Crawl
Jayant Madhavan David Ko Lucja Kot
Google Inc. Google Inc. Cornell University
jayant@google.com dko@google.com lucja@cs.cornell.edu
Vignesh Ganapathy Alex Rasmussen
Alon Halevy
Google Inc. University of California, San Diego Google Inc.
vignesh@google.com arasmuss@cs.ucsd.edu halevy@google.com
ABSTRACT
The Deep Web, i.e., content hidden behind HTML forms,
has long been acknowledged as a significant gap in search
engine coverage. Since it represents a large portion of the
structured data on the Web, accessing Deep-Web content
has been a long-standing challenge for the database commu-
nity. This paper describes a system for surfacing Deep-Web
content, i.e., pre-computing submissions for each HTML
form and adding the resulting HTML pages into a search
engine index. The results of our surfacing have been incor-
|