Dotplot: a program for exploring self-similarity in millions of lines of text and code. An interactive program, dotplot, has been developed for browsing millions of lines of text and source code, using an approach borrowed from biology for studying homology (self-similarity) in DNA sequences. With conventional browsing tools such as a screen editor, it is difficult to identify structures that are too big to fit on the screen. In contrast, with dotplots we find that many of these structures show up as diagonals, squares, textures, and other visually recognizable features, as will be illustrated in examples selected from biology and two new application domains, text (AP news, Canadian Hansards) and source code (5ESS®). In an attempt to isolate the mechanisms that produce these features, we have synthesized similar features in dotplots of artificial sequences. We also introduce an approximation that makes the calculation of dotplots practical for use in an interactive browser.
Keywords for this software
References in zbMATH (referenced in 5 articles )
Showing results 1 to 5 of 5.
- Arbuckle, Tom: Studying software evolution using artefacts’ shared information content (2011)
- Evans, William S.; Fraser, Christopher W.; Ma, Fei: Clone detection via structural abstraction (2009)
- Roy, Chanchal K.; Cordy, James R.; Koschke, Rainer: Comparison and evaluation of code clone detection techniques and tools: A qualitative approach (2009)
- March, T.K.; Chapman, S.C.; Dendy, R.O.: Recurrence plot statistics and the effect of embedding (2005)
- Kontogiannis, K.A.; Demori, R.; Merlo, E.; Galler, M.; Bernstein, M.: Pattern matching for clone and concept detection. (1996)