GPLAG: detection of software plagiarism by program dependence graph analysis. Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own products. Although current plagiarism detection tools appear sufficient for academic use, they are nevertheless short for fighting against serious plagiarists. For example, disguises like statement reordering and code insertion can effectively confuse these tools. In this paper, we develop a new plagiarism detection tool, called GPLAG, which detects plagiarism by mining program dependence graphs (PDGs). A PDG is a graphic representation of the data and control dependencies within a procedure. Because PDGs are nearly invariant during plagiarism, GPLAG is more effective than state-of-the-art tools for plagiarism detection. In order to make GPLAG scalable to large programs, a statistical lossy filter is proposed to prune the plagiarism search space. Experiment study shows that GPLAG is both effective and efficient: It detects plagiarism that easily slips over existing tools, and it usually takes a few seconds to find (simulated) plagiarism in programs having thousands of lines of code.

This software is also peer reviewed by journal TOMS.

References in zbMATH (referenced in 7 articles )

Showing results 1 to 7 of 7.
Sorted by year (citations)

  1. Wang, Hongzhi; Li, Ning; Li, Jianzhong; Gao, Hong: Parallel algorithms for flexible pattern matching on big graphs (2018)
  2. Lestringant, Pierre; Guihéry, Frédéric; Fouque, Pierre-Alain: Assisted identification of mode of operation in binary code with dynamic data flow slicing (2016)
  3. Ma, Shuai; Cao, Yang; Fan, Wenfei; Huai, Jinpeng; Wo, Tianyu: Strong simulation: capturing topology in graph pattern matching (2014)
  4. Qu, Wei; Jia, Yuanyuan; Jiang, Michael: Pattern mining of cloned codes in software systems (2014) ioport
  5. Bettenburg, Nicolas; Shang, Weiyi; Ibrahim, Walid M.; Adams, Bram; Zou, Ying; Hassan, Ahmed E.: An empirical study on inconsistent changes to code clones at the release level (2012) ioport
  6. Linstead, Erik; Bajracharya, Sushil; Ngo, Trung; Rigor, Paul; Lopes, Cristina; Baldi, Pierre: Sourcerer: mining and searching internet-scale software repositories (2009) ioport
  7. Roy, Chanchal K.; Cordy, James R.; Koschke, Rainer: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach (2009)