Readability tools
The readers of a text should understand it, and also search engines like Google analyze texts und use the results for their page rank. So every author is interested in writing understandable texts. One part of measaring is the readability of texts with readability formulas.
Readability formulas are offered as online tools, so you just copy and paste your text, click on a button and get the results. Most of these tools work for English texts only. Please note than some of these tools store your data, or even worse, pass them to third parties. Some programs have a limit. Like in statistics, long texts give more informatono than just a small sample, and they fit better to draw conclusions.
Also desktop applications are available, some of these are free, others require a license. Some of these applications offer support for other languages than English.
Overviews of software using readability formulas
This page contains information of free and commercial software and is based on these lists:- Search/Atlas lists 15 of the best readability scoring and checker tools
- The Search Engine journals describes 15 tools.
- Karen Smiley's page lists 14 tools
- Quanteda project - R package
- 10 free readability calculators
Many programs are mentioned in more than one list, some links in these overviews are just dead. This page is a summary of all pages mentioned above - thanks for the efforts of the authors.
Some readability formulas have long names and are often abbrevated with the following:
ARI: Automated Readability Index
FRE: Flesch Reading Ease Index
FOG:
SMOG: Simple Measure of Gobbledygook
LW: Linsear Write, often called Fog Count
Applications for readability formulas
For this purpose, readability formulas have been used for a long time. Many websites that offer SEO (search engine optimization) services offer readability tools for free, other tools just calculate readability formulas and present the results.
In general readability formulas were developed with a cloze procedure test, and based on the results of a regression analysis these formulas were developed. Hundreds of them exists, many but not all are used in computer programs. The most important criteria are the length of sentences (counting words), and the length of words (countuing syllables or letters).
Only syntactical criteria of a text are considered. Better results were possible, if one would know the knowledge of the readers, but getting these data is difficult and therefore a limitation of readability formulas, because the readers' knowledge is not taken into account.
Criteria for the choice of appropriate readability formlas
Which readability formula is the best? There is no best formula. Which readability formula fits your needs is dependent on three criteria:
- the language, e.g. English, Spanish, French, German, etc
- the text genre, e.g. fiction, manual, prose, school book
- the readers, e.g. children up to 10 years, teenagers, adults, foreign language learners, patients, soldiers
So descriptions like popular formula or best formula don't make sense, you must know what kind of text you have and know who will read it.
Implications of readability formulas
These implications have to be considered, otherwise your results may not be valid. Although many readability formulas can be used for any text, some were developed for special text genres and/or special readers:
- Flesch Reading Ease Index (REI, or FRE) and derivates (from Farr, Jenkins, Patters 1951; Powers, Sumner, Kearl 1958; Kincaid, Fishburne, Rogers, Chissom 1975): school texts grade 3 to 12
- Flesch-Kincaid: originally for prose, used by US Department of Defence, reading grade and reading age
- Dale/Chall: reading grade 3rd grade or below, children 5-10 years
- Spache 1953, 1978, 1.-4. class, reading grade
- Automated Reading Index (ARI): US grade level, technical texts of Department of Defence
- Forcast: technical manuals, reading grade and reading age, 5-12th class
Flesch-Kincaid and ARI were developed for military pesonnel, and one can argue if these formulas give valid results for a general audience. In practice both are used for an unspecific readership.
Linguistic problems that can influence the validity of readability formulas
Other caveats are that readability formulas require a counting of syllables, lists of words or need standardization.
- Syllable count: this is a very difficult problem in English, but also in other languages. If you have written texts, their pronounciation cannot always derived from the writing. Or in short: counting diphtongs and vowels doesn't work always.
- word lists: some readability formulas use word lists of familiar words and count the unfamiliar words, like Dale-Chall or Spache.
- standardization: many formulas only work for a text sample of exactly 100 words and will calculate invalid results if the text is smaller or bigger.
Counting syllables
In English this is very difficult because writing and speaking sometimes differ a lot. Just counting vowels will not work with words with a silent e at the end like: love, ride, leave, were, where etc.
In some readability formulas all syllables are needed, in other only words with three or more syllables (so called polysyllables), or with one syllable only. If you really want precise results, you must work with a look-up table. Other readability formulas determine the length of a word by counting the characters (letters) of a word which is easier to program.
The following readability formulas use the count of syllables: Flesch Reading Ease and derivates (as mentioned above), Flesch-Kincaid, SMOG, Gunning FOG, Forcast, Linsear Write, Lix, Rix, Wheeler-Smith.
Wordlists
Dale-Chall and Spache published two different versions of their formulas, most vendors don't specify the version they use. The differences are the word lists, both have them and count the words that are not mentioned in the word list.
However, implementing these word lists is not just copying the words from the publication into a file that is used by the software. Both authors only published the basic forms of the words, e.g. the infinitive of verbs, the singular of nouns, and the absolute form of adjectives. The original papers describe the rules how and what to count, but it seems that only one vendor did that. The wordlist of Dale and Chall consists of 3000 basic forms, but a complete list with all forms of these words has a lot more entries.
Standardization of readability formulas
Some formulas are in a form that require a sample size of a certain size, mostly 100 words. So many readability formulas need either an exact sample of text, or one has to standardize the formula mathematically. This is necessary if one variable in the equation is just a counter for the whole text, but the formula is for 100 words samples, e.g. Coleman-Liau, Forcast, Flesch's Reading Ease Index (and all derivates), SMOG.