The analysis of the RTT scores will provide the basis for your estimate of how well the test subjects and the people they represent comprehend the test language. Ensuring that scoring has been applied consistently will make that analysis and estimate more accurate. When all of the scores from at least one location are available, the analysis process begins through the computation of figures such as the mean and standard deviation. These measures show you what the typical score was and how much the scores differed from one another.
The standard deviation gives an indication of how the scores vary for each group of subjects. For example, a high measure hints that the subjects did not understand the text equally. If comprehension is not close to equal, then it will be necessary to investigate the possible reasons for the differences.
The standard deviation of the comprehension scores at a given location, for a given text, is a measure of the variability of the scores. If every subject has the same score, the standard deviation will be zero. The more variability in the scores, the greater the standard deviation will be. If this measure is higher than the value associated with your team's analysis protocol, it suggests that some subjects are considerably more familiar with the speech variety than others.
If your intention in testing is to infer inherent intelligibility, your sampling protocol will have included safeguards to screen for language contact. Sometimes, however, these are insufficient.
Generally, people with considerably more prior exposure to another speech variety than their fellows will score higher on a comprehension test. Occasionally, though, greater exposure will mean a heightened sensitivity to the features of a speech variety spoken by a despised people. This could produce LOWER comprehension scores as the subject feigns incomprehension to reflect an unwillingness to accept the speech variety in which the story is told.
Any time that variability in test scores is observed, whether through a high standard deviation, widely differing first and third quartiles, or a visual inspection of the raw scores, one should look at the biographical data collected in the pre-RTT interview. (See the appendix in Radloff 1993 for sample pre-RTT questions.)
Usually there are too few respondents to permit a conclusive statistical analysis of the influence of social variables (sex, age, travel patterns, habits of radio listening, etc.). However, when there are signs of variable comprehension, the researcher should look for striking patterns.
Are there, for example, three scores in your sample that are at least 20 percentage points higher than the others? Were these attained by people who share one or more social characteristics? Did the people with unusual scores also give unusual answers to any of the post-RTT questions?
Does the fact of a high standard deviation point instead to a few very low scores? If so, do these scores seem really to indicate that the subjects did not understand the story, or is there a more likely explanation – the subject's unwillingness to proceed with the test, a distraction or some other incapacity? And don't forget to look at individual answers to the post-RTT (and pre-RTT) questions.
Sometimes the survey team suspects variability in the scores before the team leaves the area. If time allows, the team should consider testing more than the customary ten subjects, to ensure a sample of sufficient size to suggest the reasons why scores are not more uniform.
In any case, one should not be content to report a high standard deviation – in such cases, one may want to report the distribution of scores, without, of course, compromising confidentiality. Whenever possible, there should be appropriate commentary on the sample, and even speculation on the reasons for the variation.