A Comparison of Source Code Plagiarism Detection Engines

Author: Lancaster Thomas   Culwin Fintan  

Publisher: Routledge Ltd

ISSN: 0899-3408

Source: Computer Science Education, Vol.14, Iss.2, 2004-06, pp. : 101-112

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

Automated techniques for finding plagiarism in student source code submissions have been in use for over 20 years and there are many available engines and services. This paper reviews the literature on the major modern detection engines, providing a comparison of them based upon the metrics and techniques they deploy. Generally the most common and effective techniques are seen to involve tokenising student submissions then searching pairs of submissions for long common substrings, an example of what is defined to be a paired structural metric. Computing academics are recommended to use one of the two Web-based detection engines, MOSS and JPlag. It is shown that whilst detection is well established there are still places where further research would be useful, particularly where visual support of the investigation process is possible.