Text Analysis Overview

All on software that analyses texts and other sources of human communication

Search



Definitions and terms

Last update: 28. July 2011

The following sections provide more information on software, their versions, the author(s), distributor(s), URLs for more information, and information on trial, test or demo versions. All data are taken from the web sites of the authors/distributors. Sometimes other information comes from the authors themselves. The classification however is in my responsibility. Please have in mind that the categories are not exclusive, some programs can be classified in more than one category.

Program means the name of the software and where it is briefly described. Author means the person(s) or the company who developed the product, links there guide you to the homepage. If no author is mentioned, he/she is not known. The distributor is the person or company that distributes the software, the links guide you to the page where the product is described. If no distributor is mentioned, the author is also the distributor.

Trial versions are full versions for a certain time period (mostly 30 days), after this period you must buy a license key to continue. Test versions are full versions also, but they have restrictions like printing disabled or limited file sizes.

OS means operating system. DOS means the first OS for PCs, mostly MS-DOS 3.3 or higher, but also compatible OS like DR-DOS and PC-DOS. Win3.1 is the old windows version (first release 1990), the predecessor of Win9x. Win9x means Win95 and Win98. Mostly these programs will run under Windows2000, Windows ME, and Windows XP, too. MacOS is the operating system for the Apple MacIntosh. Other operating systems are UNIX derivates or running on mainframe computers like IBM. The descriptions are mostly taken from the web sites also, often I summarised them, perhaps later on comments will follow. 

Quite a few attempts have been made in the recent years to classify text analysis software. After the ICA-conference in Acapulco in June 2000 I had many talks with colleagues, and a new hopefully clearer classification is now available.

The following chapters are under construction and will be filled with the appropriate information as soon as it is available. The author is the person who designed/wrote the software, the distributor is the person or the company that distributes it; sometimes author and distributor are the same. Documentation and download provide links to them, often printed documentation is included if you buy a test, trial, or demo version. The operating system gives you information what operation system the software needs: Win9x means Windows 95 and Windows 98, WinNT means Windows NT 4.0, the most software will run under Windows 2000, WindowsXP and Windows 7, too. Mac OS is the operating system for the Apple Macintosh, mostly version OS-X 10.4 or newer. MS-DOS programs will often run in DOS-windows of other operating systems.

 

Classification of text analysis software

 

Comments are made by me unless otherwise noticed. Often I wrote just a short sentence because I have no better idea; this will be improved by more substantial comments but will take some time. If you have comments, e-mail them to me for an inclusion. Also other suggestion are welcome.

Important notice: pages like these require are permanent update. If you find errors, dead links or the like, please notify me.

language: dealing with the use of language  

  • linguistic: applications like parsing, lemmatising words
  • data bank: information retrieval in texts, indexers, concordances, word lists, KWIC/KWOC (key-word-in-context, key-word-out of-context)

content: dealing with the content of human communication, mainly texts. Often data bank features are part of these programs.

  • qualitative: looking for regularities and differences in text, exploring the whole text (QDA - qualitative data analysis). A few programs allow the processing of audio and video information also. There is no common paradigm of QDA, there are many approaches.
  • event data: analysis of events in textual data
  • quantitative: analyse the text selectively to test hypotheses and draw statistical inferences. Output is a data matrix that represents the numerical results of the coding.
    • category systems: provided by the software developer (instrumental) or by the researcher (representational), this is selective, only search patterns are searched in the text and coded. Software packages with built-in dictionaries are often language restricted, some have limits on the text unit size and are restricted to process responses to open ended questions but not to analyse mass media texts. The categories can be thematic or semantic, this can have implications on the definition of text units and the use of external variables.
    • no category system: using co-occurences of words and/or concepts, these are displayed as graphs or dendrograms.
    • for coding responses to open ended questions only: these programs cannot analyse huge amount of texts, they fit for rather homogeneous texts only and are often limited in the size of a text unit.