Key points are not available for this paper at this time.
Various techniques for detecting similar programs in large classes have been proposed previously, but research in this area is hampered by the lack of a means for evaluating their performance. To address this deficiency, new concepts are introduced that permit the effectiveness of competing systems to be quantified and enable realistic comparisons to be made. Using these criteria, popular approaches to plagiarism detection based on counting program attributes are shown to be inadequate. A two-stage method of identifying similar pairs based on structural features is proposed, and the superior performance of this technique is established.
Geoff Whale (Thu,) studied this question.