|
Text Analysis Info - Content: quantitative without category system |
Last update: 12. February 2008
The following progams do not work with category system. They mostly analyse the
co-occurences of the words in the text, some perform multivariate statistical
analyses like factor analysis, cluster analysis, or multi-dimensional scaling
(MDS). Other use neural networks, but some companies just don't mention what
technique their software uses.
author: Max Reinert
distributor: Image, Toulouse, France
program: Alceste
documentation: in English, French, and Italian
download: none
operating systems: MS-Windows, MacOS, Unix
description: can do word lists, extracts clases of terms
program: Catpac II
author: Joseph Woelfel
distributor: Galileo Company
operating system: MS-Windows
documentation: manual
download: no
description: none yet
program: Hamlet
author: Alan Brier 
download: free for personal use
operating system: DOS, Win3.1, MS-Windows
documentation: manual in (MS-Word), and
tutorial (self extracting file)
description:
The main idea of HAMLET (c) is to search a text file for words
in a given vocabulary list, and to count joint frequencies within
any specified context unit, or as collocations within a given span
of words.
Individual word frequencies (fi), joint frequencies (fij)
for pairs of words (i,j), both expressed in terms of the chosen unit
of context, and the corresponding standardised joint frequencies
sij = (fij) / (fi + fj - fij)
are displayed in a similarities matrix, which can be submitted to
a simple cluster analysis and multi-dimensional scaling.
A further option allows comparison of the results of applying multi-dimensional
scaling to matrices of joint frequencies derived from a
number of texts, using Procrustean Individual Differences Scaling (PINDIS).
Further procedures are included to help to determine the broad
characteristics of word usage in a text:
- KWIC offers Key-Word-In-Context listings for any given word-string.
- WORDLIST generates lists of words and frequencies.
- COMPARE lists words common to pairs of texts, and is useful in
generating vocabulary lists, including synonyms, for use in comparing
a number of texts.
|
Intelligent Miner for Text - Text Analysis Tools 2.3 |
program: IBM
author and distributor: IBM
download:trial version 60 days
operating system: OS/390
documentation: fact sheet
description:
The text analysis tools can be used to analyse all types of online documentation,
from customer requests and technical reports to newspaper and magazine articles.
- Organize documentation:
By creating a folder directory structure, the Categorization tools can catalog
and sort items according to user-defined categoriess.
- Navigate documentation:
When looking for information on a specific subject, we may use many different
documents in our search. The Clustering tools can provide an overview of all the
documentation that has been used.
- Annotate Documentation:
Rather than printout documentation and highlight the relevant items of interest,
the Feature Extraction tool can highlight text onscreen.
- Summarize documentation:
As there is now less time to read lengthly and detailed information, document
summaries can be automatically created using the Summarization tool to determine
whether the whole document should be read.
program: Semio 2.0
author: Claude Vogel
distributor: Entrieva
documentation: none
download: live demos
operating system(s): MS-Windows, Solaris 2.5
description:
Semio Taxonomy combines unique linguistic analysis
technology and statistical clustering with user-defined
vocabulary requirements to create an intuitively
browsable structure of categories that provides intelligent
access to the global information space within a mass of
formerly unstructured text.
Important phrases and keywords are extracted from a
variety of text sources such as intranet/Internet sites,
Lotus Notes, Documentum, ODBC-compliant databases,
XML, etc. This process combines language detection,
proximity analysis and stemming and normalization rules
to produce the cleanest, most informative extraction
technology available.
These extracted concepts are then clustered using
information theory techniques developed as the result of
work over the past twenty years. Once this process has
selected the truly relevant information from the original
unstructured text, any number of top-level classification
structures can be applied to it. These structures extract
lexical derivatives from the network of clusters and place
them into categories. The result: a browseable category
structure that actually provides insights to the user about
the search space without resorting to the 'hunt-and-peck'
method of keyword searches. Since the only requirement
of a classification structure is that it reflects information
that can be found within the source text, the configuration
and customisation of the structure is virtually unlimited.
The client can configure their taxonomies to reflect a corporate thesaurus or
controlled vocabulary. Semio Taxonomy is fully compliant with ISO thesauri,
and can be tailored to any client terminology initiative. The power of applying
multiple classification structures to the same source text becomes clear when
users see for the first time the actual textual evidence that led to those
structures in the first place.
Process Steps:
Semio Taxonomy performs a three-step process to classify text contents.
- Text is collected from different sources about 500 different formats
can be read.
- Semio's phrase extraction pulls relevant, informative phrases from within the text.
- The phrases are attached to a set of categories
which can come from a thesaurus, pre-built
category set from Semio, or a custom structure of
the user's choosing. The category structures can
then be validated and modified in an easy, iterative
process to ensure quality and consistency.
Semio needs 96 MB RAM and a minimum of 500 MB free disk space.
program: SPAD-T
author and distributor: Decisia
documentation:none
download: no
operating systems: not specified
description:
SPAD-T analyses texts of automatically by associating numerically coded
information. Comparisions of texts are done with probabilistic type and methods.
Categorisation can also take external variables (e.g. age, sex, profession)
into account using SPAD-N.
SPAD-T counts words and word sequences (phrases) using sort order tables and
exclusion criteria like length or frequency. Using probabilistic methods
characteristic words, word sequences, or sentences are found. Also KWICs with
a fixed line length of 132 characters are possible.
Comparisions of the vocabularies of texts are performed with diffenrent types
of factorial analyses and correspondence analyses. Also external variables
can be included.
Contingency tables of common words or the segments repeated within the texts
are also possible. Cluster analyses (hierarchical using reciprocal neighbors)
using Ward's method allow e.g. an automatic classification of responses to
open ended questions.
program: TextAnalyst 2.3 or
German version
author: Sergej Ananyan
distributor: Megaputer
download: evaluation
operating system: MS-Windows
documentation: tutorial and a
white papers
description:
TextAnalyst is a unique intelligent text processing tool capable of automated semantic
analysis, summarisation, and navigation of unstructured natural language
texts. In addition, TextAnalyst can help you perform clustering of
documents in your textbase, semantic information retrieval, and focus your text exploration around a certain subject.
program: T-Lab 5.5 pro
author: Franco Lancia
distributor: T-lab
documentation: in English, Italian, French, and Spanish. Also a quick introduction is available in these languages. The tutorial is only available in English.
download: Test version (multilingual)
operating system: MS-Windows
description:
T-LAB software is an all-in-one set of linguistic and statistical tools for text analysis which can be used in the following research fields: semantic analysis, content analysis, perceptual mapping, text mining, and discourse analysis.
Available versions are in English, French, Italian and Spanish, each with a dictionary and a knowledge base. Moreover, without automatic lemmatisation, T-LAB allows us to analyse texts in all languages supporting ASCII format.
T-LAB has three sub-menus: analyses and maps, lexicon, consultation.
All ANALYSES AND MAPS can be done with two kinds of settings: automatic or customised.
There is a limit on the file size of 10 MB, for most analyses this will not be exceeded.
Please send comments and suggestions to