Skip to main content
U.S. Department of Energy
Office of Scientific and Technical Information

DisClose: Discovering Colossal Closed Itemsets via a Memory Efficient Compact Row-Tree

Conference ·
Itemset mining has recently focused on discovery of frequent itemsets from high-dimensional datasets with relatively few rows and a larger number of items. With exponentially in-creasing running time as average row length increases, mining such datasets renders most conventional algorithms impracti-cal. Unfortunately, large cardinality closed itemsets are likely to be more informative than small cardinality closed itemsets in this type of dataset. This paper proposes an approach, called DisClose, to extract large cardinality (colossal) closed itemsets from high-dimensional datasets. The approach relies on a memory-efficient Compact Row-Tree data structure to represent itemsets during the search process. The search strategy explores the transposed representation of the dataset. Large cardinality itemsets are enumerated first followed by smaller ones. In addition, we utilize a minimum cardinality threshold to further reduce the search space. Experimental result shows that DisClose can complete the extraction of colossal closed itemsets in the considered dataset, even for low support thresholds. The algorithm immediately discovers closed itemsets without needing to check if each new closed itemset has previously been found.
Research Organization:
Pacific Northwest National Laboratory (PNNL), Richland, WA (US)
Sponsoring Organization:
USDOE
DOE Contract Number:
AC05-76RL01830
OSTI ID:
1076707
Report Number(s):
PNNL-SA-85968; 400470000
Country of Publication:
United States
Language:
English

Similar Records

An efficient approach to discovering knowledge from large databases
Conference · Mon Dec 30 23:00:00 EST 1996 · OSTI ID:535538

A fast distributed algorithm for mining association rules
Conference · Mon Dec 30 23:00:00 EST 1996 · OSTI ID:535540

Hash based parallel algorithms for mining association rules
Conference · Mon Dec 30 23:00:00 EST 1996 · OSTI ID:535539

Related Subjects