Text Analysis Info

Overview on software that analyses texts and other sources of human communication

Search



Language - information retrieval

Last update: 20. April 2017
Programs listed here can be divided into more subtle groups:  

  • pure information retrievers: searching and displaying texts, indexers
  • concordancers: programs providing concordances 

The programs AnyText and ATA (Ashton Text Analyzer), Eric Jonson's programs, Kura, Lexa, MicroOCP, MonoConc were removed because the links are dead and no more information seems to be available.

Analysis 2.94

program: Analysis 2.94
author: Giovanni Lo Conti
distributor: Giovanni Lo Conti (mc4386@openaccess.it
)
documentation: none
download: free version
operating system: MS-Windows, Digital Unix, Acorn RiscOS
description: Analysis is a program which allows several types of analysis about the text: concordances, KWIC, KWOC, indexes of readability, co-occurrences, lemmatization, statistics about the sentences, non intelligent abstract; Summary; meaningful and sense; Incipit; explicit; frequency; for many procedures it is possible to delimite the range or compare the text with an electronic dictionary; it is provided whith Help, Help on line, and Wimp.

AntConc 3.4.4

program: AntConc 3.4.4
author: Laurence Anthony
distributor: Laurence Anthony
documentation: Tutorials including videos, text materials available in English, Japanese, Korean, Arabic, and German.
download: free version
operating system: MS-Windows, Mac OS-X, Linux
description: This is a free concordance program.

AskSam7.7

program: Ask Sam 7.7
author: Ask Sam Software Development
distributor: Ask Sam Software Development
documentation: overview and quick tour
download: trial version currently unavailable
operating system: MS-Windows, Mac OS-X, IOS
description: AskSam is a fast information retrieval program and allows searching in E-mails and PDF-files. The new professional version allows programming (e.g. with Visual Basic).
Note: currently the original website www.asksam.com is not available.

Collocate

program: Collocate
author: Michael Barlow
distributor: Athelstan
documentation: is in the test version file
download: demo The demo processes data in the same manner as the full version, but the results are limited to the top 5 items.
operating system(s): looks like MS-Windows, but no version is specified.
description: Collocate is a new software program that can be used to find collocations or terms in a corpus. There are three main components: 

    • Search for a word (phrase) within a set span (e.g. 4 words). The program lists all the collocations containing the searchword and provides frequency and/or statistical information (Log Likelihood, Mutual Information).
    • Produce an n-gram list for the corpus.
    • Extract collocations from the corpus as a whole.

 

Concordance 3.3 - not available any more

One can find links to download it, but these links are dead.

Corpus Presenter 14.0

program: Corpus Presenter 14.0
author: Raymond Hickey
distributor: Raymond Hickey
documentation: manual
download: full and free version
operating system(s): WinXP and newer
description: Corpus Presenter is a suite of programs designed to work with both existing corpora and any files which users might wish to examine for linguistically interesting structures. It has all the options of standard corpus software, i.e. it can generate concordances, word lists and perform a whole range of text retrieval tasks and generate reverse dictionaries of words in texts. It does not require that texts are prepared in any way, e.g. by indexing them in advance.

KWIC Concordance 5.0

program: KWIC Concordance 5.0
author: Satoru Tsukamoto College of Humanities and Sciences, English Department, Nihon University, Japan
distributor: Satoru Tsukamoto College of Humanities and Sciences, English Department, Nihon University, Japan
documentation: none
download: free
operating system(s): MS-Windows
description: The KWIC Concordance is a corpus analytical tool for making word frequency lists, concordances and collocation tables by using electronic files. This program offers the capability of handling markup schemes, such as COCOA, SGML, the Helsinki corpus, the Penn-Helsinki Parsed Corpus of Middle English (Phase 1) (Phase 2) etc. This is freeware software.

Metamorph - not available anymore


Phrase Context 1.02

program: Phrase Context
author/distributor: Hans J. Klarskov Mortensen
download: test version
documentation: none
operating systems: Windows ?
description: Phrase Context is a versatile program that counts words and phrases, does concordances, calculates TTR-and lexical density values, regular expressions as search patterns, and writes XML formatted output files. The author also provides some free utilities like extracting texts from PDF-files.

PhraseContext can output wordlists, citations and phrasebooks in XML-format for further processing and web-display with added XSL-formattings stylesheets or java-script. Below are some examples. These are not worth much in themselves, and they’re only supplied here to give some idea of the structure of the XML output, and some ideas for XSL-formatting. The idea was to create a basic XML-compliant output that can then be manipulated by means of other tools (css, xsl) designed for that use. I’ll be happy to host other people’s stylesheets on this site. Drop me a line with a description together with the script.

SCP 4.0.9 - Simple Concordance Program

program: SCP 4.0.9 - Simple Concordance Program
author/distributor: Alan Reed
download: free software
documentation: help file as a PDF-file
operating systems: MS-Windows, Mac OS-X
description: This free program lets you create word lists and search natural language text files for words, phrases, and patterns. SCP is a concordance and word listing program that is able to read texts written in many languages. There are built-in alphabets for English, French, German, Greek, Russian, etc. SCP contains an alphabet editor which you can use to create alphabets for any other language.

Sonar 2003.32 Text Retrieval/Document Management Systems

program: Sonar 2003.32
distributor: Virginiasystems
download: demo
documentation: none
operating systems: MS-Windows, Mac OS-X
description: High speed program than can process many types of text and word processing files.

Textalyzer

program: Textalyzer
author: Bernhard Huber
distributor: none
documentation: self explaining
download: none
operating system: runs on a web site
description: Textalyser is a free text analysis tool that counts words, sentences, syllables, and lexical density. It also computes the Gunning readability index. A small but nice tool that counts syllables correct at least for English, French, and German. You can cut and paste text or specify a web page.
Note: the last update of this website was in 2004.

Textstat 2.9

program: Textstat 2.9
author: Matthias Hüning
distributor: Matthias Hüning
documentation: manual
download: freeware
operating system: MS-Windows, Mac OS-X, Linux. needs Python
description: TextSTAT is a simple programme for the analysis of texts. It reads ASCII/ANSI texts and HTML files (directly from the internet) and it produces word frequency lists and concordances from these files. The programme is distributed as freeware. Source code in Python is also available for free. The user interface is provided in English, German, Dutch, Potugese, Spanish, Catalan, French, Italian, Galician, Finnish (Suomi), Polish, or Czech.

WordSmith 5.0

program: WordSmith 5.0
author: Mike Scott
distributor: Mike Scott, Liverpool University
documentation: manual in English, French, and German
download: test version shows a sample of the results only
operating system: MS-Windows
description: WordSmith is the sucessor of MicroConcord.