TF-ICF: A New Term Weighting Scheme for Clustering Dynamic Data Streams
Conference
·
OSTI ID:930754
- ORNL
In this paper, we propose a new term weighting scheme called Term Frequency - Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods.
- Research Organization:
- Oak Ridge National Laboratory (ORNL)
- Sponsoring Organization:
- ORNL work for others
- DOE Contract Number:
- AC05-00OR22725
- OSTI ID:
- 930754
- Country of Publication:
- United States
- Language:
- English
Similar Records
Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix
New Term Weighting Formulas for the Vector Space Method in Information Retrieval
Hypergraph Random Walks, Laplacians, and Clustering
Patent
·
Tue Oct 16 00:00:00 EDT 2012
·
OSTI ID:1079419
New Term Weighting Formulas for the Vector Space Method in Information Retrieval
Technical Report
·
Sun Feb 28 23:00:00 EST 1999
·
OSTI ID:5698
Hypergraph Random Walks, Laplacians, and Clustering
Conference
·
Mon Oct 19 00:00:00 EDT 2020
·
OSTI ID:1691481