Plagiarism
Last update: 17. June 2011
Definitions
- hapax legomena are words that occur only one time in a text. Some programs that generate a word list can select these if you specify a maximum frequency of 1.
- tri-gram is a part of the text that consist exactly of three words. These three words combinations are counted and compared with the results of other texts.
- n-gram is the general form of a tri-gram, you can set the n to a number that makes sense. If n=1, a word list is generated. If n is too high, most n-grams occur only once in a text and are therefore not useable, also the software will need some time to generate e.g. 10-grams.
Software
- CopyCatch Gold is developed by David Woolls and compares the hapax legomena of two texts, and if the common percentage of these words exceeds 70 %, you can assume that the texts have passages in common.
- Glatt deletes every fifth word, and the student must insert the missing words. The detection of plagiats is high, but it is also a lot of work.
- Pl@giarism is developed by Georges Span from the University of Maastricht (Belgium) and works with tri-grams.
- Plagiarism Finder by Mediaphor Software Entertainment extracts phrase from a give document and searches for these using Google as a search engine.
- Turnitin by iParadigm compares a text with stored documents from the internet and ca. 4500 print media. A so called fingerprint gives statistical information and is used to decide whether the uploaded document is a plagiat or not. All uploladed documents are stored for future use.
- Scriptum by Vancouver Software Labs is also based on a tri-gram algorithm.
- MyDropBox uses the TF-IDF approach (Terms frequency - inverse document frequency) and compares the results with other results already stored by the software.
Literature
- Weber-Wulff, Debora: Software evaluation of plagiat detection software
- Clough, Paul: Plagiarism in natural and programming languages: an overview of current tools and technololgies.
- Kramer, André: Falsche Fuffziger. Textplagiate per Software auf der Spur. In: c't 21/2004, p. 176-181