Talk:Interpreting Word List Data

From SurveyWiki
Revision as of 02:18, 21 October 2011 by Marcus Hansley (talk | contribs)
Jump to navigationJump to search

What is the standard cutoff percentage? -- Marcus Hansley 02:16, 21 October 2011 (PDT) -- Marcus Hansley 02:16, 21 October 2011 (PDT)

What is the standard cutoff percentage? -- Marcus Hansley 01:46, 21 October 2011 (PDT)


I am writing my first survey report and am getting into the interpretation of the wordlists that we collected. As I started looking into how to interpret the calculated lexical similarity percentages after I analyzed it in WORDSURV, I found that Joseph Grimes and Gary Simons were recommending a 60% cutoff. However, in several survey reports I have reviewed, and on the Survey Wiki site which you updated, under ‘Interpreting Word List Data’ 70% is the recommended cutoff. Hmm.

I checked around, and I was given a paper by Douglas Boone written in 2007 for a presentation at AFLAC, which was helpful for me to have a bit more background on what this cutoff percentage is based on, and the reliability of it. This is the relevant portion of his paper:

‘APPENDIX A. ON THRESHOLDS AND CRITERIA As far as I am aware, the published basis for the practice of using a threshold figure for inferring either possible comprehension or probable non-comprehension is the work of Gary Simons and of Joseph and Barbara Grimes. Simons (1979/1983) suggested that comprehension could be predicted fairly well based on vocabulary similarity, while J. Grimes (1988/1992) maintained that the correlation is usually too weak to be of value. However, they were in agreement that intelligibility is not expected in cases of less than 60% shared vocabulary. Some survey teams use 70% as their threshold. This is in keeping with the “Simplified Flow Chart for Decision Making” that can be found at the beginning of the Survey Reference Manual (“General Considerations” section) and with the “Language Assessment Criteria” that came out of the first International Language Assessment Conference in 1989. Simons observes that for his combined data, “above 60 percent similarity, intelligibility steadily rises”. In his model, expected comprehension at 60% similarity is only about 20–30%. Expected comprehension at 70% similarity is about 45–50%, still too low to suggest the possibility of shared literature. In his review, Grimes considers not expected values, but the highest scores obtained in actual testing; he observes: “Vocabulary similarity percentages of 60 percent and below go consistently with intelligibility measured at 67 percent and below on simple narrative material.” In some of the cases that they studied, the “vocabulary similarity” percentage represents cognates; in others the basis for evaluating similarity is doubtless more impressionistic.’

So it looks like 60% is a safe cutoff to use, but 70% can often be used as a good rule of thumb. What is your take on it?


Marcus Hansley