Home

About

Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network
FAQHELPSITE MAPCONTACT US


  Advanced Search  

 
Data-Parallel Web Crawling Models Berkant Barla Cambazoglu, Ata Turk, and Cevdet Aykanat
 

Summary: Data-Parallel Web Crawling Models
Berkant Barla Cambazoglu, Ata Turk, and Cevdet Aykanat
Department of Computer Engineering, Bilkent University
06800, Ankara, Turkey
{berkant,atat,aykanat}@cs.bilkent.edu.tr
Abstract. The need to quickly locate, gather, and store the vast amount
of material in the Web necessitates parallel computing. In this paper,
we propose two models, based on multi-constraint graph-partitioning,
for efficient data-parallel Web crawling. The models aim to balance the
amount of data downloaded and stored by each processor as well as
balancing the number of page requests made by the processors. The
models also minimize the total volume of communication during the link
exchange between the processors. To evaluate the performance of the
models, experimental results are presented on a sample Web repository
containing around 915,000 pages.
1 Introduction
During the last decade, an exponential increase has been observed in the amount
of the textual material in the Web. Locating, fetching, and caching this con-
stantly evolving content, in general, is known as the crawling problem. Currently,
crawling the whole Web by means of sequential computing systems is infeasible

  

Source: Aykanat, Cevdet - Department of Computer Engineering, Bilkent University

 

Collections: Computer Technologies and Information Sciences