Summary: Architecture of a grid-enabled Web search engine
B. Barla Cambazoglu, Evren Karaca, Tayfun Kucukyilmaz, Ata Turk,
Cevdet Aykanat *
Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
Received 1 January 2006; received in revised form 10 October 2006; accepted 13 October 2006
Available online 11 December 2006
Search Engine for South-East Europe (SE4SEE) is a socio-cultural search engine running on the grid infrastructure. It
offers a personalized, on-demand, country-specific, category-based Web search facility. The main goal of SE4SEE is to
attack the page freshness problem by performing the search on the original pages residing on the Web, rather than on
the previously fetched copies as done in the traditional search engines. SE4SEE also aims to obtain high download rates
in Web crawling by making use of the geographically distributed nature of the grid. In this work, we present the architec-
tural design issues and implementation details of this search engine. We conduct various experiments to illustrate perfor-
mance results obtained on a grid infrastructure and justify the use of the search strategy employed in SE4SEE.
Ó 2006 Elsevier Ltd. All rights reserved.
Keywords: Search engine; Web crawling; Text classification; Grid computing
In this age of information, search engines act as important services, providing the community with the
information hidden in the Web and, due to their frequent use, stand as an integral part of our lives. The last
decade has witnessed design and implementation of several state-of-the-art search engines (Page & Brin, 1998).