DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space mathbb{R}^n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 8 articles )

Showing results 1 to 8 of 8.
Sorted by year (citations)

  1. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, Shujie Liu: CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation (2021) arXiv
  2. Sargsyan, S.; Kurmangaleev, Sh.; Belevantsev, A.; Avetisyan, A.: Scalable and accurate detection of code clones (2016) ioport
  3. Bettenburg, Nicolas; Shang, Weiyi; Ibrahim, Walid M.; Adams, Bram; Zou, Ying; Hassan, Ahmed E.: An empirical study on inconsistent changes to code clones at the release level (2012) ioport
  4. Thummalapenta, Suresh; Cerulo, Luigi; Aversano, Lerina; Di Penta, Massimiliano: An empirical study on the maintenance of source code clones (2010) ioport
  5. Evans, William S.; Fraser, Christopher W.; Ma, Fei: Clone detection via structural abstraction (2009) ioport
  6. Roy, Chanchal K.; Cordy, James R.; Koschke, Rainer: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach (2009)
  7. Kapser, Cory J.; Godfrey, Michael W.: “Cloning considered harmful” considered harmful: Patterns of cloning in software (2008) ioport
  8. Tairas, Robert; Gray, Jeff: An information retrieval process to aid in the analysis of code clones (2008) ioport