Text Analysis Info

Privacy Consent

Last update: 20. September 2023

This section is a summary of questions that reach me. I tried to answer them, if you do not agree with me or if you want to contribute yourself, please don't hesitate. All contributions will be considered and may be published here.

What kind of software do I need to do X, where X is one of:
scraping stories from a website

simple term-searching
keyword searching
clustering analyses, the programs cluster text into different clusters. These cluster analyses require a certain text size.
converting files between formats (the 'Word documents' question)

converting images/hardcopy to electronic text (OCR - optical character recognition)
some particular CA-related task (like keywords in context, collocation, etc.). Often these analyses help you to detect problems like ambiguity (a word has several meanings) or negation.
what kind(s) of statistical tests can I use on CA data?
what statistical packages work with CA data? Nearly any package that can read CSV-data (comma separated values) like SPSS or R.
are there any good books on CA?
how do I CA X, where X is one of:
- web pages: web crawlers
- media or speech transcripts: dictation software
- focus group or conversational or interview transcripts: dictation software
- historical documents: OCR software
- images