A Distributed Algorithm of Density-Based Subspace Frequent Closed Itemset Mining
10th IEEE International Conference on High Performance Computing and Communications (HPCC 2008)
IEEE Computer Society
Large, dense-packed and high-dimensional data mining is one challenge of frequent closed itemset mining for association analysis, although frequent closed itemset mining is an efficient approach to reduce the complexity of mining frequent itemsets. This paper proposes a distributed algorithm to address the challenge of discovering frequent closed itemsets in large, dense-packed and high-dimensional data. The algorithm partitions the search space off requent closed itemsets into independent nonoverlapping subspaces that can be extracted independently to generate frequent closed itemsets. The algorithm can generate frequent closed itemsets according to dense priority: the closed itemset more dense or more frequent will be generated preferentially. The experimental results show the algorithm is efficient to extract frequent closed itemsets in large data.