Krimp: mining itemsets that compress

ISSN： 1384-5810

Source： Data Mining and Knowledge Discovery, Vol.23, Iss.1, 2011-07, pp. : 169-214

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.

Related content

Answering constraint-based mining queries on itemsets using previous materialized results

By Esposito Roberto Meo Rosa Botta Marco

Journal of Intelligent Information Systems, Vol. 26, Iss. 1, 2006-01 ,pp. : 95-111 (17)

Springer Publishing Company

Access to resources Recommend Favorite

Mining top-K frequent itemsets from data streams

By Wong Raymond Fu Ada

Data Mining and Knowledge Discovery, Vol. 13, Iss. 2, 2006-09 ,pp. : 193-217 (25)

Springer Publishing Company

Access to resources Recommend Favorite

Mining top-K frequent itemsets through progressive sampling

By Pietracaprina Andrea Riondato Matteo Upfal Eli Vandin Fabio

Data Mining and Knowledge Discovery, Vol. 21, Iss. 2, 2010-09 ,pp. : 310-326 (17)

Springer Publishing Company

Access to resources Recommend Favorite

GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets

By Gouda Karam Zaki Mohammed

Data Mining and Knowledge Discovery, Vol. 11, Iss. 3, 2005-11 ,pp. : 223-242 (20)

Springer Publishing Company

Access to resources Recommend Favorite

Towards a new approach for mining frequent itemsets on data stream

By Raïssi Chedy Poncelet Pascal Teisseire Maguelonne

Journal of Intelligent Information Systems, Vol. 28, Iss. 1, 2007-02 ,pp. : 23-36 (14)

Springer Publishing Company

Access to resources Recommend Favorite