Text Analysis Info

Overview on software that analyses texts and other sources of human communication


Definitions and terms

Last update: 7. Septemer 2014

The following sections provide more information on software, their versions, the author(s), distributor(s), URLs for more information, and information on trial, test or demo versions. All data are taken from the web sites of the authors/distributors. Sometimes other information comes from the authors themselves. The classification however is in my responsibility. Please have in mind that the categories are not exclusive, some programs can be classified in more than one category.

Program means the name of the software and where it is briefly described. Author means the person(s) or the company who developed the product, links there guide you to the homepage. If no author is mentioned, he/she is not known. The distributor is the person or company that distributes the software, the links guide you to the page where the product is described. If no distributor is mentioned, the author is also the distributor.

Trial versions are full versions for a certain time period (mostly 30 days), after this period you must buy a license key to continue. Test versions are full versions also, but they have restrictions like printing disabled or limited file sizes.

OS means operating system. All software that require MS-DOS or an equivalent or an MS-Windows version released before 2000 were deleted.  Mac OS-X is the operating system for the Apple MacIntosh. Other operating systems are UNIX derivates or running on mainframe computers like IBM. The descriptions are mostly taken from the web sites also, often I wrote a summary.

Quite a few attempts have been made in the past to classify text analysis software. After the ICA-conference in Acapulco in June 2000 I had many talks with colleagues, and a new hopefully clearer classification is now available.

The author is the person who designed/wrote the software, the distributor is the person or the company that distributes it; sometimes author and distributor are the same. Documentation and download provide links to them, often printed documentation is included if you buy a test, trial, or demo version. The operating system gives you information what operation system the software needs: Win9x means Windows 95 and Windows 98, WinNT means Windows NT 4.0, the most software will run under Windows 2000, WindowsXP and Windows 7, too. Mac OS is the operating system for the Apple Macintosh, mostly version OS-X 10.4 or newer. MS-DOS programs will often run in DOS-windows of other operating systems.


Classification of text analysis software


Comments are made by me unless otherwise noticed. Often I wrote just a short sentence because I have no better idea; this will be improved by more substantial comments but will take some time. If you have comments, e-mail them to me for an inclusion. Also other suggestion are welcome.

Important notice: pages like these require are permanent update. If you find errors, dead links or the like, please notify me.

language: dealing with the use of language  

  • linguistic: applications like parsing, lemmatising words
  • data bank: information retrieval in texts, indexers, concordances, word lists, KWIC/KWOC (key-word-in-context, key-word-out of-context)

content: dealing with the content of human communication, mainly texts. Often data bank features are part of these programs.

  • qualitative: looking for regularities and differences in text, exploring the whole text (QDA - qualitative data analysis). A few programs allow the processing of audio and video information also. There is no common paradigm of QDA, there are many approaches.
  • event data: analysis of events in textual data
  • quantitative: analyse the text selectively to test hypotheses and draw statistical inferences. Output is a data matrix that represents the numerical results of the coding.
    • category systems: provided by the software developer (instrumental) or by the researcher (representational), this is selective, only search patterns are searched in the text and coded. Software packages with built-in dictionaries are often language restricted, some have limits on the text unit size and are restricted to process responses to open ended questions but not to analyse mass media texts. The categories can be thematic or semantic, this can have implications on the definition of text units and the use of external variables.
    • no category system: using co-occurences of words and/or concepts, these are displayed as graphs or dendrograms.
    • for coding responses to open ended questions only: these programs cannot analyse huge amount of texts, they fit for rather homogeneous texts only and are often limited in the size of a text unit.