CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code. Copy-pasted code is very common in large software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Recent studies show that copy-paste is prone to introducing bugs and a significant portion of operating system bugs concentrate in copy-pasted code. Unfortunately, it is challenging to efficiently identify copy-pasted code in large software. Existing copy-paste detection tools are either not scalable to large software, or cannot handle small modifications in copy-pasted code. Furthermore, few tools are available to detect copy-paste related bugs. In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating systems, and detects copy-paste related bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copy-pasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected 28 copy-paste related bugs in the latest version of Linux and 23 in FreeBSD. In addition, we analyze some interesting characteristics of copy-paste in Linux and FreeBSD, including the distribution of copy-pasted code across different length, granularity, modules, degrees of modification, and various software versions.
Keywords for this software
References in zbMATH (referenced in 10 articles )
Showing results 1 to 10 of 10.
- Qu, Wei; Jia, Yuanyuan; Jiang, Michael: Pattern mining of cloned codes in software systems (2014)
- Arbuckle, Tom: Studying software evolution using artefacts’ shared information content (2011)
- Spinellis, Diomidis: CScout: a refactoring browser for C (2010)
- Evans, William S.; Fraser, Christopher W.; Ma, Fei: Clone detection via structural abstraction (2009)
- Roy, Chanchal K.; Cordy, James R.; Koschke, Rainer: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach (2009)
- Wang, Jianyong; Zhang, Yuzhou; Zhou, Lizhu; Karypis, George; Aggarwal, Charu C.: CONTOUR: an efficient algorithm for discovering discriminating subsequences (2009)
- Falke, Raimar; Frenzel, Pierre; Koschke, Rainer: Empirical evaluation of clone detection using syntax suffix trees (2008)
- Tairas, Robert; Gray, Jeff: An information retrieval process to aid in the analysis of code clones (2008)
- Han, Jiawei; Cheng, Hong; Xin, Dong; Yan, Xifeng: Frequent pattern mining: Current status and future directions (2007)
- Padioleau, Yoann; Lawall, Julia L.; Muller, Gilles: Smpl: A domain-specific language for specifying collateral evolutions in Linux device drivers. (2007)