SPRINT

SPRINT: a scalable parallel classifier for data mining. Classification is an important data mining problem. Although classification is a well-studied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable. The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model. This parallelization, also presented here, exhibits excellent scalability as well. The combination of these characteristics makes the proposed algorithm an ideal tool for data mining.


References in zbMATH (referenced in 33 articles )

Showing results 1 to 20 of 33.
Sorted by year (citations)

1 2 next

  1. Wang, Ran; He, Yu-Lin; Chow, Chi-Yin; Ou, Fang-Fang; Zhang, Jian: Learning ELM-tree from big data based on uncertainty reduction (2015)
  2. De Stefano, C.; Folino, G.; Fontanella, F.; Scotto di Freca, A.: Using Bayesian networks for selecting classifiers in GP ensembles (2014)
  3. Khoshgoftaar, Taghi M.; Xiao, Yudong; Gao, Kehan: Software quality assessment using a multi-strategy classifier (2014)
  4. Nasridinov, Aziz; Lee, Yangsun; Park, Young-Ho: Decision tree construction on GPU: ubiquitous parallel computing approach (2014)
  5. Bifet, Albert: Adaptive stream mining: Pattern learning and mining from evolving data streams. (2010)
  6. Shih, Wen-Chung; Yang, Chao-Tung; Tseng, Shian-Shyong: Performance-based data distribution for data mining applications on grid computing environments (2010)
  7. Chandra, B.; Varghese, P.Paul: Moving towards efficient decision tree construction (2009)
  8. Elnaffar, Said; Martin, Pat; Schiefer, Berni; Lightstone, Sam: Is it DSS or OLTP: Automatically identifying DBMS workloads (2008)
  9. Glimcher, Leonid; Jin, Ruoming; Agrawal, Gagan: Middleware for data mining applications on clusters and grids (2008)
  10. Hu, Hui-Ling; Chen, Yen-Liang: Mining typical patterns from databases (2008)
  11. Castro, José; Secretan, Jimmy; Georgiopoulos, Michael; DeMara, Ronald; Anagnostopoulos, Georgios; Gonzalez, Avelino: Pipelining of Fuzzy ARTMAP without matchtracking: Correctness, performance bound, and Beowulf evaluation (2007)
  12. Osei-Bryson, Kweku-Muata: Post-pruning in decision tree induction using multiple performance measures (2007)
  13. Yen, Ester; Chu, I-Wen Mike: Relaxing instance boundaries for the search of splitting points of numerical attributes in classification trees (2007)
  14. Nguyen, Hung Son: Approximate Boolean reasoning: Foundations and applications in data mining (2006)
  15. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006)
  16. Wu, Xintao: Incorporating large unlabeled data to enhance EM classification (2006)
  17. Zaki, Mohammed J.; Aggarwal, Charu C.: XRules: An effective algorithm for structural classification of XML data (2006)
  18. Agrawal, Rakesh; Gehrke, Johannes; Gunopulos, Dimitrios; Raghavan, Prabhakar: Automatic subspace clustering of high dimensional data (2005)
  19. Castro, José; Georgiopoulos, Michael; Demara, Ronald; Gonzalez, Avelino: Data-partitioning using the Hilbert space filling curves: effect on the speed of convergence of fuzzy artmap for large database problems (2005)
  20. Castro, José; Georgiopoulos, Michael; Secretan, Jimmy; Demara, Ronald F.; Anagnostopoulos, Georgios; Gonzalez, Avelino: Parallelization of fuzzy ARTMAP to improve its convergence speed: the network partitioning approach and the data partitioning approach (2005)

1 2 next