Advanced Search

Browse by Discipline

Scientific Societies

E-print Alerts

Add E-prints

E-print Network

  Advanced Search  

Scalable Information Organization Javed Aslam, Fred Reiss, and Daniela Rus

Summary: Scalable Information Organization

Javed Aslam, Fred Reiss, and Daniela Rus
Department of Computer Science
Dartmouth College
Hanover, NH 03755 USAĦ
jaa, frr, rus˘ @cs.dartmouth.edu
We present three scalable extensions of the star algorithm for information organization that use
sampling. The star algorithm organizes a document collection into clusters that are naturally induced
by the topic structure of collection, via a computationally efficient cover by dense subgraphs. We
also provide supporting data from extensive experiments.
1 Introduction
Our goal is to develop a completely automated information organization system for digital libraries,
automated tools for librarians to classify this information, automatic tools to create reference pointers
into such collections, and automated tools that allow users to locate information effectively.
We focus on static and dynamic digital collections of unstructured text. We consider the problem of
determining the topic structure of text data, without a priori knowledge of the number of topics in the
data or any other information about their composition. We assume that the collections may be static
(for example, digital legacy collections) or dynamic (for example, news wires). We look to discover


Source: Aslam, Javed - College of Computer Science, Northeastern University
Dartmouth College, Transportable Agents for Wireless Networks Group


Collections: Computer Technologies and Information Sciences