RainForest — A Framework for Fast Decision Tree Construction of Large Datasets. Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework called Rain Forest for classification tree construction that separates the scalability aspects of algorithms for constructing a tree from the central features that determine the quality of the tree. The generic algorithm is easy to instantiate with specific split selection methods from the literature (including C4.5, CART, CHAID, FACT, ID3 and extensions, SLIQ, SPRINT and QUEST). In addition to its generality, in that it yields scalable versions of a wide range of classification algorithms, our approach also offers performance improvements of over a factor of three over the SPRINT algorithm, the fastest scalable classification algorithm proposed previously. In contrast to SPRINT, however, our generic algorithm requires a certain minimum amount of main memory, proportional to the set of distinct values in a column of the input relation. Given current main memory costs, this requirement is readily met in most if not all workloads.

References in zbMATH (referenced in 17 articles , 1 standard article )

Showing results 1 to 17 of 17.
Sorted by year (citations)

  1. Tzirakis, Panagiotis; Tjortjis, Christos: T3C: improving a decision tree classification algorithm’s interval splits on continuous attributes (2017)
  2. Rokach, Lior; Maimon, Oded: Data mining with decision trees. Theory and applications. (2015)
  3. Gama, João; Žliobaitė, Indrė; Bifet, Albert; Pechenizkiy, Mykola; Bouchachia, Abdelhamid: A survey on concept drift adaptation (2014)
  4. Khoshgoftaar, Taghi M.; Xiao, Yudong; Gao, Kehan: Software quality assessment using a multi-strategy classifier (2014) ioport
  5. Chen, Bee-Chung; LeFevre, Kristen; Ramakrishnan, Raghu: Adversarial-knowledge dimensions in data privacy (2009) ioport
  6. Guo, Hongyu; Viktor, Herna L.: Multirelational classification: a multiple view approach (2008) ioport
  7. Loukides, Grigorios; Shao, Jian-Hua: An efficient clustering algorithm for (k)-anonymisation (2008) ioport
  8. Kotsiantis, S. B.; Zaharakis, I. D.; Pintelas, P. E.: Machine learning: a review of classification and combining techniques (2007) ioport
  9. Yen, Ester; Chu, I-Wen Mike: Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees (2007) ioport
  10. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006) ioport
  11. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006) ioport
  12. Zaki, Mohammed J.; Aggarwal, Charu C.: XRules: an effective algorithm for structural classification of XML data (2006)
  13. Aggarwal, Charu C.; Bradley, Paul S.: On the use of wavelet decomposition for string classification (2005) ioport
  14. Berzal, Fernando; Cubero, Juan-Carlos; Marín, Nicolás; Sánchez, Daniel: Building multi-way decision trees with numerical attributes (2004)
  15. Baek, Jun-Geol; Kim, Chang-Ouk; Kim, Sung Shick: Online learning of the cause-and-effect knowledge of a manufacturing process (2002)
  16. Ganti, Venkatesh; Gehrke, Johannes; Ramakrishnan, Raghu; Loh, Wei-Yin: A framework for measuring differences in data characteristics (2002)
  17. Gehrke, Johannes; Ramakrishnan, Raghu; Ganti, Venkatesh: RainForest -- A framework for fast decision tree construction of large datasets (2000) ioport