Text Analysis Info

Overview on software that analyses texts and other sources of human communication

Search



Content - quantitative with category systems

Last update: 21. April 2017

AmCat 3.4 - Amsterdam Content Analysis Toolkit

authors: members of the section of Communication Science at the Vrije Universiteit Amsterdam
program: AmCAT 3.4
documentation: Introduction
download: none, you work online
operating system: irrelevant, you work online
description: AmCAT is an online tool for content analyses, especially relational content analysis.

CoAn 2.08 - Content Analysis (German only)

author: Matthias Romppel
program: CoAn 2.08
documentation: printed manual in German
download: test
operating system: Win 3.x, Win9x, WinNT, does not run on 64-bit systems
description: word list, concordances, frequencies of categories COAN is inspired by a former Intext version. It uses dictionaries to code texts, special features are interactive coding, powerful search patterns like word co-occurences. It is available in German only.
Personal comment: this site has not been updated since 2006.

Diction 7.0

program: DICTION 7.0
author: Roderick P. Hart

distributor: Digitext Inc., Austin, TX, USA
download: trial version
manual: manual
operating system: MS-Windows, Mac OS-X
description: Diction uses dictionaries (word-lists) to search a text for these qualities:

  • Certainty: Language indicating resoluteness, inflexibility, and completeness and a tendency to speak ex-cathedra.
  • Activity: Language featuring movement, change, the implementation of ideas and the avoidance of inertia.
  • Optimism: Language describing tangible, immediate, recognizable matters that affect people's everyday lives.
  • Commonality: Language highlighting the agreed-upon values of a group and rejecting idiosyncratic modes of engagement.
  • Realism: Language describing tangible, immediate, recognizable matters that affect people's everyday lives

The results can be statistically analysed and are compared with other texts, so that an under- or overrepresentation of categories can be detected.

General Inquirer

program: General Inquirer
author and distributor: Philip J. Stone
download: yes, but only the category systems
operating system: Java, category systems are Excel-files (XLS)
documentation: description of categories
description: The grandfather of many content analysis software is now available for computers that run Java and are able to read the category system (Excel files). 

KH Coder

program: KH Coder 2.00f
author and distributor: Koichi Higuchi
download: free download
operating system: MS-Windows, Mac OS-X, Linux
documentation: Manual
description: KH Coder is a free software for quantitative content analysis or text mining. It supports the analysis of texts in Japanese, English, French, German, Italian, Portuguese and Spanish. KH Coder has following features:
Words: Frequency List, Searching, KWIC Concordance, Collocation Stats, Correspondence Analysis, Multi-Dimensional Scaling, Hierarchical Cluster Analysis, Co-Occurrence Network

  • Categories: Developing Your Own Coding Rules, Frequency List, Cross Tabulation, Correspondence Analysis, Multi-Dimensional Scaling, Co-Occurrence Network, Hierarchical Cluster Analysis
  • Documents: Searching, Clustering, Naive Bayes classifier

KH Coder provides these functions using back-end tools such as Stanford POS Tagger, Snowball stemmer, MySQL and R. Just input raw texts and you can utilize these functionalities.

LIWC 2015 - LInguistic Word Count - updated

program: LIWC 2015 - LInguistic Word Count
author: James B. Pennebaker
Roger J. Booth, and Martha E. Francis.
distributor:
Pennebaker Conglomerates, Inc.
download: with registration only
operation system: MS-Windows, Mac OS-X
documentation: LIWC 2015 manual
description: The program analyses text files on a word-by-word basis, calculating percentage words that match each of several language dimensions. The program has 68 pre-set dimensions (output variables) including linguistic dimensions, word categories tapping psychological constructs, and personal concern categories, and can accommodate user-defined dimensions as well.
In the LIWC 2007 version the dictionary has been extended. In the Mac OS-version there are new features like phrases and parts of words (stems) as search patterns, and also highlighting of the text. A lite version for students is also available.

MCCA - Minnesota Contextual content analysis

program: Dimap 4.0 with MCCA
operating system: Win95
authors: Ken Litkowski, Donald McTavish
distributor: CL Research
download: test
documentation: no, but many white papers on the website
description: DIMAP/MMCA description

personal comment: the web pages are outdated, one last edited in 2001 (sic!)

PCAD 3 - updated

program: PCAD 3
author and distributor: GB Software

documentation: manual
download: no
operating system(s): Win 3.1 or newer including Windows 10
description: The primary area of interest is measuring psychobiologically interesting states such as anxiety, hostility, and hope using the Gottschalk-Gleser content analysis scales. These scales have been empirically developed and tested, and have been shown to be reliable and valid in a wide range of studies. Louis A. Gottschalk (M.D. Ph.D.) has been the principal developer of these scales, and has applied them in many areas of medicine and beyond.

Protan - Protocol Analyser- deleted
TEXTPACK 7.0 - TextPackage - deleted

 


TextQuest 4.2

program: TextQuest 4.2
author and distributor: Social Science Consulting

download: test version in English and German
manual: Manual as a PDF-file, also included in the test version's installation folder
operating system(s): MS-Windows, Mac OS-X
description: TextQuest uses dictionaries to code texts, special features are interactive coding, powerful search patterns like word co-occurences, and negation detection for English and German. The text exploring features are word lists supporting sort order tables, exclusion lists (STOP-words), KWIC-lines with variable length, and lists of word sequences (phrases) and word permutations. The readability module consists of 80 readability formula for 8 languages (English, French, German, Spanish, Italian, Dutch, Danish, and Swedish) and language independent ones. TextQuest is available with an English or German user interface. It includes a category manager that allows to created a category system from a word list, and also standard category systems are included in the full version, e.g. RID (Regressive Imagery Dictionary) for English and German, and the HKW (Hamburger kommunikationssoziologisches Wörterbuch) .

Whissell's dictionary of Affect in Language - updated

author: Cynthia Whissell

distributor: unknown
program:dictionary of Affect in Language (DAL)
operating system: MS-Windows 7 and newer
documentation: manual
download: DAL setup
description: The Dictionary of Affect in Language (DAL) is an attempt to quantify emotion in language. Volunteers viewed many thousands of words and rated them in terms of their Pleasantness, Activation, and Imagery (concreteness). The DAL is embedded in a computer program which is used to score language samples on the basis of these three dimensions. The DAL has been applied to studies of fiction (e.g., Frankenstein, David Copperfield), of poetry (e.g., the work of Frost, Blake), drama (e.g., Shakespeare’s tragedies and comedies), advertisements, group discussions, and lyrics (e.g., the Beatles). It has also been used in the selection of words for memory research.

UIMA

program: UIMA
authors: many
distributor: IBM Research and IBM Software Group and Carnegie Mellon University
documentation: documentation of the SDK
download: www.ibm.com/developerworks/data/downloads/uima/downloads.html
operating systems: Java, SDK independent from operating system
description: UIMA stands for the Unstructured Information Management Architecture.
It is an open, industrial-strength, scaleable and extensible platform for creating, integrating and deploying unstructured information management solutions from combinations of semantic analysis and search components.
IBM is making UIMA available as free, open source software to provide a common foundation for industry and academia to collaborate and accelerate the world-wide development of technologies critical for discovering the vital knowledge present in the fastest growing sources of information today.
UIMA Software Development Kit (SDK) is freely available, also the UIMA core Java framework source code. In particular the UIMA APIs are available for creating customized solutions in WebSphere Information Integrator OmniFind Edition.

Wordscores

program: Wordscore
authors: Michael Laver, Kenneth Benoit, and John Garry
distributor: Trinity College University of Dublin, Ireland
documentation: user documentation
download: yes, source for the different modules is provided
operating systems: requires Stata version 7 or better
description: Wordscores is a set of Stata programs to perform a content analysis. A set of program named wordfreq, phrasefreq, setref, describetext, wordscore and textscore help you to explore the text and assigning codes.

WordStat 7.1

program: WordStat 7.1
author: Normand Peladeau
distributor: Provalis Research or Social Science Consulting (Europe)
documentation: manual as a PDF-file
download: test version expires after 30 days
operating systems: MS-Windows
description: WordStat is an add-on to QDA-Miner or SimStat, a general purpose statistic program (comparable to SPSS e.g.). Both packages are integrated and especially useful for the coding of answers to open ended questions. It also includes thesauri and spell-checker for different languages. It comes with Colin Martindale's RID - Regressive Imagery Dictionary (English, French, Portuguese, Swedish, German, Latin) and a few other dictionaries and thesauri (WordNet, Roget's thesaurus). Version 7.1 offers geospatial processing, and also WordStat is available for Stata.

Yoshikoder 0.6.5 - updated

program: Yoshikoder 0.6.5
author: William Lowe

author: Will Lowe
distributor: Will Lowe
documentation: user documentation
download: free version
operating systems: MS-Windows, Mac OS-X, Linux, with Java environment
description: Yoshikoder works with text documents, whether in plain ASCII, Unicode (e.g. UTF-8), or a national encodings (e.g. Big5 Chinese.) You can construct, view, and save keywords-in-context. You can write content analysis dictionaries can be constructed using PERL-style regular expressions. Yoshikoder provides summaries of documents, either as word frequency tables or according to a content analysis dictionary. You can also compare documents according to word frequency profile or with respect to a content dictionary. Yoshikoder's native file format is XML, so dictionaries and keyword-in-context files are non-proprietary and human readable. The RID and LIWC are also available.