Krimp: mining itemsets that compress. One of the major problems in pattern mining is the explosion of the number of results. Tight constraints reveal only common knowledge, while loose constraints lead to an explosion in the number of returned patterns. This is caused by large groups of patterns essentially describing the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of patterns is that set that compresses the database best. For this task we introduce the Krimp algorithm. Experimental evaluation shows that typically only hundreds of itemsets are returned; a dramatic reduction, up to seven orders of magnitude, in the number of frequent item sets. These selections, called code tables, are of high quality. This is shown with compression ratios, swap-randomisation, and the accuracies of the code table-based Krimp classifier, all obtained on a wide range of datasets. Further, we extensively evaluate the heuristic choices made in the design of the algorithm.

References in zbMATH (referenced in 19 articles , 1 standard article )

Showing results 1 to 19 of 19.
Sorted by year (citations)

  1. Bloem, Peter; de Rooij, Steven: Large-scale network motif analysis using compression (2020)
  2. Fürnkranz, Johannes; Kliegr, Tomáš; Paulheim, Heiko: On cognitive preferences and the plausibility of rule-based models (2020)
  3. Grünwald, Peter; Roos, Teemu: Minimum description length revisited (2019)
  4. Macha, Meghanath; Akoglu, Leman: Explaining anomalies in groups with characterizing subspace rules (2018)
  5. Guns, Tias; Dries, Anton; Nijssen, Siegfried; Tack, Guido; De Raedt, Luc: MiningZinc: a declarative framework for constraint-based mining (2017)
  6. Hess, Sibylle; Morik, Katharina; Piatkowski, Nico: The PRIMPING routine -- tiling through proximal alternating linearized minimization (2017)
  7. Li, Yao; Liu, Lingqiao; Shen, Chunhua; van den Hengel, Anton: Mining mid-level visual patterns with deep CNN activations (2017)
  8. Li, Geng; Zaki, Mohammed J.: Sampling frequent and minimal Boolean patterns: theory and application in classification (2016)
  9. Petitjean, François; Li, Tao; Tatti, Nikolaj; Webb, Geoffrey I.: Skopus: mining top-(k) sequential patterns under leverage (2016)
  10. Koutra, Danai; Kang, U.; Vreeken, Jilles; Faloutsos, Christos: Summarizing and understanding large graphs (2015)
  11. Tomczak, Jakub M.; Ziȩba, Maciej: Probabilistic combination of classification rules and its application to medical diagnosis (2015)
  12. Zimek, Arthur; Vreeken, Jilles: The blind men and the elephant: on meeting the problem of multiple truths in data from clustering and pattern mining perspectives (2015)
  13. Lam, Hoang Thanh; Mörchen, Fabian; Fradkin, Dmitriy; Calders, Toon: Mining compressing sequential patterns (2014)
  14. Lijffijt, Jefrey; Papapetrou, Panagiotis; Puolamäki, Kai: A statistical significance testing approach to mining the most informative set of patterns (2014)
  15. Nguyen, Hoang-Vu; Müller, Emmanuel; Vreeken, Jilles; Böhm, Klemens: Unsupervised interaction-preserving discretization of multivariate data (2014)
  16. Mampaey, Michael; Vreeken, Jilles: Summarizing categorical data by clustering attributes (2013)
  17. Tatti, Nikolaj; Vreeken, Jilles: Comparing apples and oranges: measuring differences between exploratory data mining results (2012)
  18. Van Leeuwen, Matthijs; Knobbe, Arno: Diverse subgroup set discovery (2012) ioport
  19. Vreeken, Jilles; Van Leeuwen, Matthijs; Siebes, Arno: Krimp: mining itemsets that compress (2011)