Text Analysis Info

Overview on software that analyses texts and other sources of human communication



Last update: 7. September 2014


This page gives an overview on plagiarism and does not claim to be complete. Please have in mind that software solutions require that both the original text and the text that is suspicious to be a plagiate must be available in digital form.

Note: this section is based on an article written by André Kramer and published by c't (omplete reference see end of this page).

In the age of information texts are digitally available, it is very easy to copy texts from the internet and publish them oneself although another person wrote the text without quoting. This is called plagiarism. A well known case is the plagiarism that was detected in the beginning of 2003. A student paper published several years before was presented as secret information by the British Secret Service, including typographical errors. So it was easy to detect that the paper was a plagiat.

Today many universities have the problems to decide whether student papers are plagiats or not. Up to 30 percent of the students admit that they copy other people's work without quoting it. But also in a commercial setting it is important not to plagiat, because using already existing trade marks e.g. can result in compensation claims or other commercial desasters.

Debora Weber-Wulff tested 17 software solutions. The results are that no system does a very good job, and only one - Ephorus - was evaluated as good. The last link shows the results of the test, and how the software was evaluated.



    • CopyCatch Gold is developed by David Woolls and compares the hapax legomena of two texts, and if the common percentage of these words exceeds 70 %, you can assume that the texts have passages in common. It is now distribute by CFL Software.
    • Duplichecker is free online software
    • Plagiarismsoftware.net is a free online software and
    • plagiarismsoftware.org seems to be the same as above
    • Glatt deletes every fifth word, and the student must insert the missing words. The detection of plagiats is high, but it is also a lot of work.
    • Pl@giarism is developed by Georges Span from the University of Maastricht (Belgium) and works with tri-grams.
    • Plagiarism Finder by Mediaphor Software Entertainment extracts phrase from a give document and searches for these using Google as a search engine.
    • PlagScan Plagiarism detector distributed by PlagScan GmbH in Cologne, Germany.
    • Turnitin by iParadigm compares a text with stored documents from the internet and ca. 4500 print media. A so called fingerprint gives statistical information and is used to decide whether the uploaded document is a plagiat or not. All uploladed documents are stored for future use.
    • Scriptum by Vancouver Software Labs is also based on a tri-gram algorithm.
    • PlagTracker is a free online tool and uses already stored texts in several languages.