Detecting ghost-written high-school assignments

Most Danish high schools use MaCom’s system Lectio when the students hand in their written assignments. MaCom already has a well-working system for detecting copy-paste plagiarism. It is an increasing problem that some students pay other people to write their assignments, and this is not detected by MaCom’s system and requires another solution.

The overall objective is to optimize MaCom’s plagiarism detection system so that the teachers are notified if an assignment is likely written by somebody else than the claimed author. More precisely, we are developing an algorithm that solves the following problem: Given a set of assignments written by an author A, we wish to decide if a new assignment is most likely written by somebody else than A.

Given the set S of assignments known to be written by an author A, we are able to extract important statistical information about the writing style of the author A. We then use different kinds of outlier detecting algorithms to check if an assignment Q of questionable authorship is written in the same style as the assignments S. Another promising approach that we are currently investigating is to use deep neural networks to learn how to detect ghost-written assignments.

We expect that the project will result in efficient algorithms for authorship detection of scientific value, not only applicable in detection of fraud in high-school assignments, but also in criminal investigation, etc.

An effective and efficient plagiarism detection will help students to become better writers and more involved in their learning process, release part of the resources that teachers are currently spending on detecting plagiarism, and improve the level of learning in high-school. In total, there is a high demand for the anticipated new functionality in MaCom’s systems.

This project is a joint project between Department of Computer Science, University of Copenhagen and MaCom.