Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

Web document clustering using hyperlink structures

Technical Report ·
DOI:https://doi.org/10.2172/815474· OSTI ID:815474
With the exponential growth of information on the World Wide Web there is great demand for developing efficient and effective methods for organizing and retrieving the information available. Document clustering plays an important role in information retrieval and taxonomy management for the World Wide Web and remains an interesting and challenging problem in the field of web computing. In this paper we consider document clustering methods exploring textual information hyperlink structure and co-citation relations. In particular we apply the normalized cut clustering method developed in computer vision to the task of hyperdocument clustering. We also explore some theoretical connections of the normalized-cut method to K-means method. We then experiment with normalized-cut method in the context of clustering query result sets for web search engines.
Research Organization:
Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, CA (US)
Sponsoring Organization:
USDOE Director, Office of Science. Office of Advanced Scientific Computing Research. Mathematical, Information, and Computational Sciences Division (US)
DOE Contract Number:
AC03-76SF00098
OSTI ID:
815474
Report Number(s):
LBNL--47971
Country of Publication:
United States
Language:
English

Similar Records

Querying the World Wide Web
Conference · Mon Dec 30 23:00:00 EST 1996 · OSTI ID:535544

Thematic World Wide Web Visualization System
Software · Thu Oct 10 00:00:00 EDT 1996 · OSTI ID:1230362

Web document engineering
Conference · Wed May 01 00:00:00 EDT 1996 · OSTI ID:489679