Last update: 20. April 2017



program: Catma 5 (Computer Aided Textual Markup & Analysis)
author: Jan Christoph Meister, University of Hamburg, Germany
distributor: University of Hamburg, Department of Languages, Germany
documentation: manual
download: from github also source code
operating system(s): MS-Windows, Mac OS-X 10.6 or newer
description: CATMA is released under the GNU general public license v3. The newest CATMA version is implemented as a web application. The development of CATMA was inspired by TACT (short for Textual Analysis Computing Tools), a DOS based tool set for textual analysis created at Toronto University.


Langsoft Text Analysis Software

program: Langsoft Text Analysis software
author: Hristo Georgiev
distributor: Langsoft
documentation: none
download: Trial version for Windows and Linux/MS-DOS
operating system(s): MS-Windows, Linux, MS-DOS
description: Langsoft offers software for parsing, spelling, machine translation, questioning and thesauri. The parsing program handles texts in English, French and German, the spelling program supports Italian also. The machine translation program is for English - German (both directions). English, German, French and Spanish are supported for the thesaurus program.

Profiler Plus 5.8.4

program: Profiler Plus 5.8.4
author: Michael D. Young
distributor: Social Science Automation
documentation: none
download: Trial version data are limited, free version for unfunded academic research. You have to create an account.
operating system(s): MS-Windows
description: A general purpose content analysis engine designed for leadership analysis. Profiler+ searches a sentence from left to right for ordered sets of tokens (words and/or punctuation) that have been identified as indicators of a trait, of another measure of interest or perhaps of a particular type of communication. Profiler+ examines each token in turn and queries a database to determine if the token serves as the anchor for any target sets. If the token does serve as an anchor in one or more target sets the program determines if the other tokens in the set are also present in the sentence in the appropriate order. If all the tokens in a set can be matched then the indicated actions are taken - in the simplest case a code is written to a file. Any remaining target sets that have not been eliminated are ignored.


program: Semato
author: Pierre Plante, Lucie Dumas and André Plante
distributor: University of Montreal, Canada
documentation: online
download: no longer available
operating system: runs as a web service
description: The whole web site is in French, there is no English version available. (C'est Quebec..) Semato is a program that allows the use of quantitative, qualitative, and mixed models.


program: SATO 4.0
author: François Daoust
distributor: University of Montreal, Canada
documentation: manual
download: test
operating system: DOS
description: SATO allows the annotation of multilingual documents, has a query language ensuring the systematic location of textual segments defined by the user, the production of an index; word lists sorted by alphabet or by frequency; the categorisation of words, word-compounds or phrases; the definition of variables to carry out multiple enumerations and lexicometric analyses; dictionary functions, if necessary, of the devices for morphological derivation; an index of legibility (GUNNING).

T-Lab plus 2017

program: T-Lab plus 2017
author: Franco Lancia
distributor: T-lab
documentation: in English, Italian, French and Spanish online.
download: Test version (multilingual) and also the manual.
operating system: MS-Windows
description: T-LAB software is an all-in-one set of linguistic and statistical tools for text analysis which can be used in the following research fields: co-occurrence analysis, thematic analysis, comparative analysis, and lexical tools. Available versions are in English, French, Italian, German and Spanish. Currently the automatic lemmatization is available for the following languages: English, French, Italian, German, Spanish and Portuguese; moreover, without automatic lemmatization, T-LAB allows the analysis of texts in all languages supporting ASCII/ANSI format.
There is a limit on the file size of 30 MB, for most analyses this will not be exceeded.


----------------------------------------------- Language - information retrieval

Last update: 20. April 2017
Programs listed here can be divided into more subtle groups:  

  • pure information retrievers: searching and displaying texts, indexers
  • concordancers: programs providing concordances 

The programs AnyText and ATA (Ashton Text Analyzer), Eric Jonson's programs, Kura, Lexa, MicroOCP, MonoConc were removed because the links are dead and no more information seems to be available.

Analysis 2.94

program: Analysis 2.94
author: Giovanni Lo Conti
distributor: Giovanni Lo Conti (mc4386[at]openaccess[dot]it
documentation: none
download: free version
operating system: MS-Windows, Digital Unix, Acorn RiscOS
description: Analysis is a program which allows several types of analysis about the text: concordances, KWIC, KWOC, indexes of readability, co-occurrences, lemmatization, statistics about the sentences, non intelligent abstract; Summary; meaningful and sense; Incipit; explicit; frequency; for many procedures it is possible to delimite the range or compare the text with an electronic dictionary; it is provided whith Help, Help on line, and Wimp.

AntConc 3.4.4

program: AntConc 3.4.4
author: Laurence Anthony
distributor: Laurence Anthony
documentation: Tutorials including videos, text materials available in English, Japanese, Korean, Arabic, and German.
download: free version
operating system: MS-Windows, Mac OS-X, Linux
description: This is a free concordance program.


program: Ask Sam 7.7
author: Ask Sam Software Development
distributor: Ask Sam Software Development
documentation: overview and quick tour
download: trial version currently unavailable
operating system: MS-Windows, Mac OS-X, IOS
description: AskSam is a fast information retrieval program and allows searching in E-mails and PDF-files. The new professional version allows programming (e.g. with Visual Basic).
Note: currently the original website is not available.


program: Collocate
author: Michael Barlow
distributor: Athelstan
documentation: is in the test version file
download: demo The demo processes data in the same manner as the full version, but the results are limited to the top 5 items.
operating system(s): looks like MS-Windows, but no version is specified.
description: Collocate is a new software program that can be used to find collocations or terms in a corpus. There are three main components: 

    • Search for a word (phrase) within a set span (e.g. 4 words). The program lists all the collocations containing the searchword and provides frequency and/or statistical information (Log Likelihood, Mutual Information).
    • Produce an n-gram list for the corpus.
    • Extract collocations from the corpus as a whole.


Concordance 3.3 - not available any more

One can find links to download it, but these links are dead.

Corpus Presenter 14.0

program: Corpus Presenter 14.0
author: Raymond Hickey
distributor: Raymond Hickey
documentation: manual
download: full and free version
operating system(s): WinXP and newer
description: Corpus Presenter is a suite of programs designed to work with both existing corpora and any files which users might wish to examine for linguistically interesting structures. It has all the options of standard corpus software, i.e. it can generate concordances, word lists and perform a whole range of text retrieval tasks and generate reverse dictionaries of words in texts. It does not require that texts are prepared in any way, e.g. by indexing them in advance.

KWIC Concordance 5.0

program: KWIC Concordance 5.0
author: Satoru Tsukamoto College of Humanities and Sciences, English Department, Nihon University, Japan
distributor: Satoru Tsukamoto College of Humanities and Sciences, English Department, Nihon University, Japan
documentation: none
download: free
operating system(s): MS-Windows
description: The KWIC Concordance is a corpus analytical tool for making word frequency lists, concordances and collocation tables by using electronic files. This program offers the capability of handling markup schemes, such as COCOA, SGML, the Helsinki corpus, the Penn-Helsinki Parsed Corpus of Middle English (Phase 1) (Phase 2) etc. This is freeware software.

Metamorph - not available anymore

Phrase Context 1.02

program: Phrase Context
author/distributor: Hans J. Klarskov Mortensen
download: test version
documentation: none
operating systems: Windows ?
description: Phrase Context is a versatile program that counts words and phrases, does concordances, calculates TTR-and lexical density values, regular expressions as search patterns, and writes XML formatted output files. The author also provides some free utilities like extracting texts from PDF-files.

PhraseContext can output wordlists, citations and phrasebooks in XML-format for further processing and web-display with added XSL-formattings stylesheets or java-script. Below are some examples. These are not worth much in themselves, and they’re only supplied here to give some idea of the structure of the XML output, and some ideas for XSL-formatting. The idea was to create a basic XML-compliant output that can then be manipulated by means of other tools (css, xsl) designed for that use. I’ll be happy to host other people’s stylesheets on this site. Drop me a line with a description together with the script.

SCP 4.0.9 - Simple Concordance Program

program: SCP 4.0.9 - Simple Concordance Program
author/distributor: Alan Reed
download: free software
documentation: help file as a PDF-file
operating systems: MS-Windows, Mac OS-X
description: This free program lets you create word lists and search natural language text files for words, phrases, and patterns. SCP is a concordance and word listing program that is able to read texts written in many languages. There are built-in alphabets for English, French, German, Greek, Russian, etc. SCP contains an alphabet editor which you can use to create alphabets for any other language.

Sonar 2003.32 Text Retrieval/Document Management Systems

program: Sonar 2003.32
distributor: Virginiasystems
download: demo
documentation: none
operating systems: MS-Windows, Mac OS-X
description: High speed program than can process many types of text and word processing files.


program: Textalyzer
author: Bernhard Huber
distributor: none
documentation: self explaining
download: none
operating system: runs on a web site
description: Textalyser is a free text analysis tool that counts words, sentences, syllables, and lexical density. It also computes the Gunning readability index. A small but nice tool that counts syllables correct at least for English, French, and German. You can cut and paste text or specify a web page.
Note: the last update of this website was in 2004.

Textstat 2.9

program: Textstat 2.9
author: Matthias Hüning
distributor: Matthias Hüning
documentation: manual
download: freeware
operating system: MS-Windows, Mac OS-X, Linux. needs Python
description: TextSTAT is a simple programme for the analysis of texts. It reads ASCII/ANSI texts and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. The programme is distributed as freeware. Source code in Python is also available for free. The user interface is provided in English, German, Dutch, Potugese, Spanish, Catalan, French, Italian, Galician, Finnish (Suomi), Polish, or Czech.

WordSmith 5.0

program: WordSmith 5.0
author: Mike Scott
distributor: Mike Scott, Liverpool University
documentation: manual in English, French, and German
download: test version shows a sample of the results only
operating system: MS-Windows
description: WordSmith is the sucessor of MicroConcord.