CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. Copy-pasted code is very common in large software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Recent studies show that copy-paste is prone to introducing bugs and a significant portion of operating system bugs concentrate in copy-pasted code. Unfortunately, it is challenging to efficiently identify copy-pasted code in large software. Existing copy-paste detection tools are either not scalable to large software, or cannot handle small modifications in copy-pasted code. Furthermore, few tools are available to detect copy-paste related bugs. In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating systems, and detects copy-paste related bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copy-pasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected 28 copy-paste related bugs in the latest version of Linux and 23 in FreeBSD. In addition, we analyze some interesting characteristics of copy-paste in Linux and FreeBSD, including the distribution of copy-pasted code across different length, granularity, modules, degrees of modification, and various software versions.

References in zbMATH (referenced in 11 articles )

Showing results 1 to 11 of 11.
Sorted by year (citations)

  1. Hallahan, William T.; Zhai, Ennan; Piskac, Ruzica: Automated repair by example for firewalls (2020)
  2. Qu, Wei; Jia, Yuanyuan; Jiang, Michael: Pattern mining of cloned codes in software systems (2014) ioport
  3. Arbuckle, Tom: Studying software evolution using artefacts’ shared information content (2011) ioport
  4. Spinellis, Diomidis: CScout: a refactoring browser for C (2010)
  5. Evans, William S.; Fraser, Christopher W.; Ma, Fei: Clone detection via structural abstraction (2009) ioport
  6. Roy, Chanchal K.; Cordy, James R.; Koschke, Rainer: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach (2009)
  7. Wang, Jianyong; Zhang, Yuzhou; Zhou, Lizhu; Karypis, George; Aggarwal, Charu C.: CONTOUR: an efficient algorithm for discovering discriminating subsequences (2009) ioport
  8. Falke, Raimar; Frenzel, Pierre; Koschke, Rainer: Empirical evaluation of clone detection using syntax suffix trees (2008) ioport
  9. Tairas, Robert; Gray, Jeff: An information retrieval process to aid in the analysis of code clones (2008) ioport
  10. Han, Jiawei; Cheng, Hong; Xin, Dong; Yan, Xifeng: Frequent pattern mining: Current status and future directions (2007) ioport
  11. Li, Zhenmin; Lu, Shan; Myagmar, Suvda; Zhou, Yuanyuan: CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code (2006) ioport