| | |
Summary: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data
Ruoming Jin Gagan Agrawal
Department of Computer and Information Sciences
Ohio State University, Columbus OH 43210
jinr,agrawalĄ @cis.ohio-state.edu
Abstract
Frequent itemset mining is a core data mining operation and
has been extensively studied over the last decade. This paper
takes a new approach for this problem and makes two ma-
jor contributions. First, we present a one pass algorithm for
frequent itemset mining, which has deterministic bounds on
the accuracy, and does not require any out-of-core summary
structure. Second, because our one pass algorithm does not
produce any false negatives, it can be easily extended to a
two pass accurate algorithm. Our two pass algorithm is very
memory efficient, and allows mining of datasets with large
number of distinct items and/or very low support levels.
Our detailed experimental evaluation on synthetic and real
datasets shows the following. First, our one pass algorithm is
very accurate in practice. Second, our algorithm requires sig-
|