https://surveywiki.info/index.php?title=Interpreting_Word_List_Data&feed=atom&action=historyInterpreting Word List Data - Revision history2024-03-29T10:10:23ZRevision history for this page on the wikiMediaWiki 1.33.0https://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1260&oldid=prevJohn Carter: /* Precision of a Lexical Similarity Percentage */2012-08-07T00:18:46Z<p><span dir="auto"><span class="autocomment">Precision of a Lexical Similarity Percentage</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 00:18, 7 August 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l93" >Line 93:</td>
<td colspan="2" class="diff-lineno">Line 93:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Precision of a Lexical Similarity Percentage==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Precision of a Lexical Similarity Percentage==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[WordSurv]] produces a ''variance'' for each lexical similarity percentage. The [[variance]] is the square of the [[standard deviation]]. This gives you a measure of how accurate the percentage is. The method which WordSurv uses takes into account an estimate (that you provide) of how reliable your data is. [[Reliability]] can be affected by many things. For <del class="diffchange diffchange-inline">exmaple</del>, if you are a new surveyor, or investigating a language group you have never tried to transcribe before, or an informant was missing some teeth, then the reliability would be lower.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[WordSurv]] produces a ''variance'' for each lexical similarity percentage. The [[variance]] is the square of the [[standard deviation]]. This gives you a measure of how accurate the percentage is. The method which WordSurv uses takes into account an estimate (that you provide) of how reliable your data is. [[Reliability]] can be affected by many things. For <ins class="diffchange diffchange-inline">example</ins>, if you are a new surveyor, or investigating a language group you have never tried to transcribe before, or an informant was missing some teeth, then the reliability would be lower.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>The method also, however, uses some statistical theory that implicitly assumes that the word list is a [[random sample]] from all the words of the language, which it is not. A consequence of this assumption is that WordSurv’s formula leads to a smaller variance (higher precision) for a longer word list. As discussed in section 2 of these procedures, what actually happens with a longer word list is that the lexical similarity percentage will tend to decrease because a longer word list includes more words that are more likely to change over time. Thus, the random sample assumption is not valid. Whether the percentage is more accurate for longer lists or not is hard to say. If it is, then it is a more accurate estimate, but of a different quantity than for a shorter list.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>The method also, however, uses some statistical theory that implicitly assumes that the word list is a [[random sample]] from all the words of the language, which it is not. A consequence of this assumption is that WordSurv’s formula leads to a smaller variance (higher precision) for a longer word list. As discussed in section 2 of these procedures, what actually happens with a longer word list is that the lexical similarity percentage will tend to decrease because a longer word list includes more words that are more likely to change over time <ins class="diffchange diffchange-inline">(words outside of the traditional 'basic lists' which are theoretically more stable)</ins>. Thus, the random sample assumption is not valid. Whether the percentage is more accurate for longer lists or not is hard to say. If it is, then it is a more accurate estimate, but of a different quantity than for a shorter list.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If a lexical similarity percentage is 68%, does that mean that the two varieties are unintelligible? If you use a strict 70% cutoff, the answer is “yes”. But look at other factors such as reported comprehension, contact, and attitudes in order to decide whether or not to consider intelligibility testing. Similarly, if the percentage is not much greater than 70% consider other factors before commencing intelligibility testing rather than base your decision on an arbitrary cutoff value.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>If a lexical similarity percentage is 68%, does that mean that the two varieties are unintelligible? If you use a strict 70% cutoff, the answer is “yes”. But look at other factors such as reported comprehension, contact, and attitudes in order to decide whether or not to consider intelligibility testing. Similarly, if the percentage is not much greater than 70% consider other factors before commencing intelligibility testing rather than base your decision on an arbitrary cutoff value.</div></td></tr>
</table>John Carterhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1259&oldid=prevJohn Carter: /* Limits of Lexicostatistics */2012-08-07T00:12:13Z<p><span dir="auto"><span class="autocomment">Limits of Lexicostatistics</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 00:12, 7 August 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l17" >Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. This is due to a variety of reasons, including 1) the fact that varieties are usually not intelligible to each other to the same degree because of social and other reasons, 2) that lexical similarity does not necessarily represent total language similarity, and 3) that most methods of comparison are hybrid methods, meaning they've used several factors to measure similarity. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. This is due to a variety of reasons, including 1) the fact that varieties are usually not intelligible to each other to the same degree because of social and other reasons, 2) that lexical similarity does not necessarily represent total language similarity, and 3) that most methods of comparison are hybrid methods, meaning they've used several factors to measure similarity. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>You also have to consider the <del class="diffchange diffchange-inline">varience </del>of your scores (see below). If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on; your <del class="diffchange diffchange-inline">varience </del>will be greater than 1%, meaning you cannot say which of these pairs of languages is more intelligible. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>You also have to consider the <ins class="diffchange diffchange-inline">variance </ins>of your scores (see below). If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on; your <ins class="diffchange diffchange-inline">variance </ins>will be greater than 1%, meaning you cannot say which of these pairs of languages is more intelligible. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td></tr>
</table>John Carterhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1258&oldid=prevJohn Carter: /* Limits of Lexicostatistics */2012-08-07T00:11:14Z<p><span dir="auto"><span class="autocomment">Limits of Lexicostatistics</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 00:11, 7 August 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l17" >Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. This is due to a variety of reasons, including 1) the fact that varieties are usually not intelligible to each other to the same degree because of social and other reasons, 2) that lexical similarity does not necessarily represent total language similarity, and 3) that most methods of comparison are hybrid methods, meaning they've used several factors to measure similarity. </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. This is due to a variety of reasons, including 1) the fact that varieties are usually not intelligible to each other to the same degree because of social and other reasons, 2) that lexical similarity does not necessarily represent total language similarity, and 3) that most methods of comparison are hybrid methods, meaning they've used several factors to measure similarity. </div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>You also have to consider the <del class="diffchange diffchange-inline">range </del>of <del class="diffchange diffchange-inline">error for </del>your <del class="diffchange diffchange-inline">comparison</del>. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on; your <del class="diffchange diffchange-inline">range of error </del>will be greater than 1%, meaning you cannot say which of these pairs of languages is more intelligible. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>You also have to consider the <ins class="diffchange diffchange-inline">varience </ins>of your <ins class="diffchange diffchange-inline">scores (see below)</ins>. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on; your <ins class="diffchange diffchange-inline">varience </ins>will be greater than 1%, meaning you cannot say which of these pairs of languages is more intelligible. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td></tr>
</table>John Carterhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1257&oldid=prevJohn Carter: /* Limits of Lexicostatistics */2012-08-07T00:05:46Z<p><span dir="auto"><span class="autocomment">Limits of Lexicostatistics</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 00:05, 7 August 2012</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l12" >Line 12:</td>
<td colspan="2" class="diff-lineno">Line 12:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Limits of Lexicostatistics==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Limits of Lexicostatistics==</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Lexicostatistics is the technique we use to calculate percentages of lexical similarity between two <del class="diffchange diffchange-inline">langauges</del>. It does have some limitations you'll need to be aware of however:</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Lexicostatistics is the technique we use to calculate percentages of lexical similarity between two <ins class="diffchange diffchange-inline">languages</ins>. It does have some limitations you'll need to be aware of however:</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* it should not be used to give specific dates of divergence of languages or dialects. The more rigorous [[Comparative Method]] can be used for that if needed. Tree diagrams of relatedness are often used in representing the findings of the Comparative Method but these should be drawn on the basis of lexicostatistics data because it does not give us enough information to construct linguistic relatedness in such detail.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* it should not be used to give specific dates of divergence of languages or dialects. The more rigorous [[Comparative Method]] can be used for that if needed. Tree diagrams of relatedness are often used in representing the findings of the Comparative Method but these should <ins class="diffchange diffchange-inline">not </ins>be drawn on the basis of lexicostatistics data because it does not give us enough information to construct linguistic relatedness in such detail.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another<ins class="diffchange diffchange-inline">. This is due to a variety of reasons, including 1) the fact that varieties are usually not intelligible to each other to the same degree because of social and other reasons, 2) that lexical similarity does not necessarily represent total language similarity, and 3) that most methods of comparison are hybrid methods, meaning they've used several factors to measure similarity. </ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div> </div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins class="diffchange diffchange-inline">You also have to consider the range of error for your comparison</ins>. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on<ins class="diffchange diffchange-inline">; your range of error will be greater than 1%, meaning you cannot say which of these pairs of languages is more intelligible</ins>. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </div></td></tr>
</table>John Carterhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1194&oldid=prevAdmin: /* Introduction */2011-10-24T00:31:54Z<p><span dir="auto"><span class="autocomment">Introduction</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 00:31, 24 October 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8" >Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Once you have the percentages, how do you interpret them? This section discusses how to use lexical similarity percentages to make inferences about intelligibility and dialect groupings. Remember that there is much more to groupings and intelligibility than just lexical similarity! While lexical similarity does not tell you everything, it does give you a starting point.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Once you have the percentages, how do you interpret them? This section discusses how to use lexical similarity percentages to make inferences about intelligibility and dialect groupings. Remember that there is much more to groupings and intelligibility than just lexical similarity! While lexical similarity does not tell you everything, it does give you a starting point.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>For further <del class="diffchange diffchange-inline">refence </del>as you read these procedures, you can refer to the [[Field Guide Glossary]].</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>For further <ins class="diffchange diffchange-inline">reference </ins>as you read these procedures, you can refer to the [[Field Guide Glossary]] <ins class="diffchange diffchange-inline">and also the [[Talk:Interpreting_Word_List_Data|Discussion Tab]] of this page</ins>.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Limits of Lexicostatistics==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Limits of Lexicostatistics==</div></td></tr>
</table>Adminhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=1186&oldid=prevMarcus Hansley: /* Limits of Lexicostatistics */2011-10-20T14:51:57Z<p><span dir="auto"><span class="autocomment">Limits of Lexicostatistics</span></span></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 14:51, 20 October 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l17" >Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<ins class="diffchange diffchange-inline"><ref>Douglas W. Boone. 2007. On the uses of word lists and implications for surveyors.</ref> </ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Even though the only strong conclusion you can make based on a lexical similarity percentage is lack of intelligibility, it is okay to use lexical similarity as a basis for a first guess at language groupings or clusters. The first guess is helpful, for example, in giving you some guidance in choosing test points for intelligibilty testing. This can be done by considering various thresholds and seeing which language varieties group together based on their lexical similarity being above the threshold.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Even though the only strong conclusion you can make based on a lexical similarity percentage is lack of intelligibility, it is okay to use lexical similarity as a basis for a first guess at language groupings or clusters. The first guess is helpful, for example, in giving you some guidance in choosing test points for intelligibilty testing. This can be done by considering various thresholds and seeing which language varieties group together based on their lexical similarity being above the threshold.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"></ins></div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;"><references/></ins></div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Lexical Similarity Matrix==</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>==Lexical Similarity Matrix==</div></td></tr>
</table>Marcus Hansleyhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=987&oldid=prevAdmin at 01:27, 13 July 20112011-07-13T01:27:10Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 01:27, 13 July 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l126" >Line 126:</td>
<td colspan="2" class="diff-lineno">Line 126:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Word lists are very useful in gaining a preliminary picture of the relationships between language varieties. While lexical similarity percentages computed from a word list are a very imprecise indicator of high intelligibility, they can be used as a reliable screen for low intelligibility. This reliability can be maximized by careful attention to the protocol used in word list elicitation, transcription, and analysis. Additionally, lexical similarity percentages can be used to form a hypothesis for dialect groupings as well as provide information relevant to intelligibility testing site selection.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Word lists are very useful in gaining a preliminary picture of the relationships between language varieties. While lexical similarity percentages computed from a word list are a very imprecise indicator of high intelligibility, they can be used as a reliable screen for low intelligibility. This reliability can be maximized by careful attention to the protocol used in word list elicitation, transcription, and analysis. Additionally, lexical similarity percentages can be used to form a hypothesis for dialect groupings as well as provide information relevant to intelligibility testing site selection.</div></td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>[[Category:<del class="diffchange diffchange-inline">Methodology</del>]]</div></td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>[[Category:<ins class="diffchange diffchange-inline">Word_Lists</ins>]]</div></td></tr>
</table>Adminhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=737&oldid=prevAdmin at 03:05, 22 June 20112011-06-22T03:05:13Z<p></p>
<table class="diff diff-contentalign-left" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #222; text-align: center;">Revision as of 03:05, 22 June 2011</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l126" >Line 126:</td>
<td colspan="2" class="diff-lineno">Line 126:</td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Word lists are very useful in gaining a preliminary picture of the relationships between language varieties. While lexical similarity percentages computed from a word list are a very imprecise indicator of high intelligibility, they can be used as a reliable screen for low intelligibility. This reliability can be maximized by careful attention to the protocol used in word list elicitation, transcription, and analysis. Additionally, lexical similarity percentages can be used to form a hypothesis for dialect groupings as well as provide information relevant to intelligibility testing site selection.</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>Word lists are very useful in gaining a preliminary picture of the relationships between language varieties. While lexical similarity percentages computed from a word list are a very imprecise indicator of high intelligibility, they can be used as a reliable screen for low intelligibility. This reliability can be maximized by careful attention to the protocol used in word list elicitation, transcription, and analysis. Additionally, lexical similarity percentages can be used to form a hypothesis for dialect groupings as well as provide information relevant to intelligibility testing site selection.</div></td></tr>
<tr><td colspan="2"> </td><td class='diff-marker'>+</td><td style="color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">[[Category:Methodology]]</ins></div></td></tr>
</table>Adminhttps://surveywiki.info/index.php?title=Interpreting_Word_List_Data&diff=331&oldid=prevAdmin: Created page with '{{data_collection_tools }} {{word_list_steps }} ==Introduction== Once you have the percentages, how do you interpret them? This section discusses how to use lexical similarity …'2010-12-14T06:40:58Z<p>Created page with '{{data_collection_tools }} {{word_list_steps }} ==Introduction== Once you have the percentages, how do you interpret them? This section discusses how to use lexical similarity …'</p>
<p><b>New page</b></p><div>{{data_collection_tools<br />
}}<br />
<br />
{{word_list_steps<br />
}}<br />
<br />
==Introduction==<br />
Once you have the percentages, how do you interpret them? This section discusses how to use lexical similarity percentages to make inferences about intelligibility and dialect groupings. Remember that there is much more to groupings and intelligibility than just lexical similarity! While lexical similarity does not tell you everything, it does give you a starting point.<br />
<br />
For further refence as you read these procedures, you can refer to the [[Field Guide Glossary]].<br />
<br />
==Limits of Lexicostatistics==<br />
<br />
Lexicostatistics is the technique we use to calculate percentages of lexical similarity between two langauges. It does have some limitations you'll need to be aware of however:<br />
<br />
* it should not be used to give specific dates of divergence of languages or dialects. The more rigorous [[Comparative Method]] can be used for that if needed. Tree diagrams of relatedness are often used in representing the findings of the Comparative Method but these should be drawn on the basis of lexicostatistics data because it does not give us enough information to construct linguistic relatedness in such detail.<br />
* lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on. To calculate [[intelligibility]], more accurate methods of intelligibilty testing should be used.<br />
<br />
Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. [[SIL]] recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.<br />
<br />
Even though the only strong conclusion you can make based on a lexical similarity percentage is lack of intelligibility, it is okay to use lexical similarity as a basis for a first guess at language groupings or clusters. The first guess is helpful, for example, in giving you some guidance in choosing test points for intelligibilty testing. This can be done by considering various thresholds and seeing which language varieties group together based on their lexical similarity being above the threshold.<br />
<br />
==Lexical Similarity Matrix==<br />
<br />
Often, a survey involves collecting word lists from more than just two varieties. A meaningful way to present the resulting lexical similarity percentages is in a matrix. In the absence of other information about how the varieties are grouped, start by setting up the matrix in some geographical ordering. Then insert the percentages.<br />
<br />
Suppose you are investigating three dialects that are located along a river in the order A, B, C. Based on your data, you find that they have the following lexical similarity percentages:<br />
<br />
{|class=wikitable border=1 cellpadding=5<br />
|-<br />
| A and B<br />
| 85%<br />
|-<br />
| A and C<br />
| 80%<br />
|-<br />
| B and C<br />
| 60%<br />
|}<br />
<br />
The resulting geographically ordered matrix would be:<br />
<br />
{|class=wikitable border=1 cellpadding=5<br />
|-<br />
| A<br />
| <br />
| <br />
|-<br />
| 85%<br />
| B<br />
| <br />
|-<br />
| 80%<br />
| 60%<br />
| C<br />
|}<br />
<br />
<br />
Note that you do not need to enter anything in the top right cells since they would be identical to the bottom left cells. The full matrix is symmetric. That is, the lexical similarity between A and B is the same as that between B and A.<br />
<br />
Just because dialects are in some order geographically does not necessarily mean that they actually group closest with their geographic neighbors. The next step is to rearrange the matrix such that more lexically similar varieties are closer to each other in the matrix. When you do this, you have to be careful to keep all the percentages in the right places! One of the most useful features of WordSurv is that it can do this rearranging automatically. In general, an optimally ordered matrix should have:<br />
<br />
* The larger percentages closer to the diagonal.<br />
* The smaller percentages further from the diagonal, closer to the bottom left corner.<br />
<br />
It might not always be possible to have the entire matrix follow these rules. For the simple<br />
example above, the rearranged matrix would be:<br />
<br />
<br />
{|class=wikitable border=1 cellpadding=5<br />
|-<br />
| B<br />
| <br />
| <br />
|-<br />
| 85%<br />
| A<br />
| <br />
|-<br />
| 60%<br />
| 80%<br />
| C<br />
|}<br />
<br />
If you were using a 70% cutoff for lack of [[intelligibility]], then you would conclude that B and C are mutually (inherently) unintelligible. You would need to use intelligibility testing to help determine if A and B understand each other and if A and C understand each other.<br />
<br />
If you did intelligibility testing and found that these two pairs do understand each other, you might conclude that both B and C understand A, but not each other, and that you could possibly just develop literature in A which both B and C could use. There are, however, many other factors, such as sociolinguistic ones, that you must look at before making this conclusion.<br />
<br />
==Precision of a Lexical Similarity Percentage==<br />
<br />
[[WordSurv]] produces a ''variance'' for each lexical similarity percentage. The [[variance]] is the square of the [[standard deviation]]. This gives you a measure of how accurate the percentage is. The method which WordSurv uses takes into account an estimate (that you provide) of how reliable your data is. [[Reliability]] can be affected by many things. For exmaple, if you are a new surveyor, or investigating a language group you have never tried to transcribe before, or an informant was missing some teeth, then the reliability would be lower.<br />
<br />
The method also, however, uses some statistical theory that implicitly assumes that the word list is a [[random sample]] from all the words of the language, which it is not. A consequence of this assumption is that WordSurv’s formula leads to a smaller variance (higher precision) for a longer word list. As discussed in section 2 of these procedures, what actually happens with a longer word list is that the lexical similarity percentage will tend to decrease because a longer word list includes more words that are more likely to change over time. Thus, the random sample assumption is not valid. Whether the percentage is more accurate for longer lists or not is hard to say. If it is, then it is a more accurate estimate, but of a different quantity than for a shorter list.<br />
<br />
If a lexical similarity percentage is 68%, does that mean that the two varieties are unintelligible? If you use a strict 70% cutoff, the answer is “yes”. But look at other factors such as reported comprehension, contact, and attitudes in order to decide whether or not to consider intelligibility testing. Similarly, if the percentage is not much greater than 70% consider other factors before commencing intelligibility testing rather than base your decision on an arbitrary cutoff value.<br />
<br />
==Lexical Similarity Groupings==<br />
<br />
Besides screening for lack of intelligibility, lexical similarity can also be used to form preliminary dialect groupings. Note that the basis for these groupings is intelligibility and not, as with the Comparative Method, any genetic similarity. Lexical similarity groupings are useful in forming a hypothesis about intelligibility groups which can then be tested using RTT, sociolinguistic investigation, and linguistic analysis.<br />
<br />
Consider the following lexical similarity matrix for some varieties of Chin.<br />
<br />
[[File:Data.jpg|500px|center|Lexical similarity matrix for some varieties of Chin]]<br />
<br />
Using a cutoff of 75%, there are four groups (A, B, C, and D). Within each group, all the percentages are at least 75%. Between groups, all the percentages are below 75%. Given the groupings, you can report the ranges of lexical similarity within each group and between groups. This involves simply looking at the portion of the matrix corresponding to the comparisons of interest and noting the range (the smallest to the largest) of the percentages. Within group B, for example, the lexical similarity ranges from 75% to 88%. Comparing B with A yields a lexical similarity range of 54% to 71%. It is possible that you could end up with a matrix where the groupings are not as clear as this one. For example, suppose you used a 70% cutoff instead of a 75% cutoff. Where would you place C and D? They have at least 70% similarity with at least one of the B varieties, but not with all of them, nor with each other. This matrix is not the final answer to dialect groupings, just a way to get a preliminary picture. Start with a low cutoff and see what happens. Then, as you increase the cutoff, the picture will become steadily clearer. However, you probably do not want to be making dialect distinctions based on a really high cutoff.<br />
<br />
Using a map, you can draw lexical similarity contours for various threshold percentages. For example, consider the following fictitious lexical similarity matrix.<br />
<br />
[[File:Data.jpg|500px|center|A fictitious lexical similarity matrix]]<br />
<br />
Using cutoffs of 60%, 70%, 80%, and 90%, the contours would look like the following:<br />
<br />
[[File:Isolects.jpg|500px|center]]<br />
<br />
Make sure to clearly indicate in your report that these are lexical similarity contours. Otherwise, someone might interpret your figure to be indicating intelligibility groupings.<br />
<br />
See the [[Lexical Similarity Grouping Examples]] page for more.<br />
<br />
==Using Lexical Similarity for Intelligibility Testing Site Selection==<br />
<br />
Consider the matrix and contours in the previous section. That matrix is just a beginning. It provides good information for deciding where you should do intelligibility testing. In general, you would want to test any two varieties for which the lexical similarity was at least 70%, but there might be situations where you would want to test for intelligibility at lower levels.<br />
<br />
Based on the lexical similarity contours for varieties A, B, C, and D, you might hypothesize that a possible single reference dialect is C. It is the only location that has at least 60% lexical similarity with all the others. At any higher threshold, there would have to be at least two reference dialects. You would want to pursue intelligibility testing to go further. These lexical similarity contours give you some idea of where to begin. For example, you would definitely want to test to see if A, B, and D can understand C. In any case, you would also want to investigate sociolinguistic factors such as patterns of contact and acquired bidialectalism before deciding on dialect groupings.<br />
<br />
==Summary==<br />
<br />
Word lists are very useful in gaining a preliminary picture of the relationships between language varieties. While lexical similarity percentages computed from a word list are a very imprecise indicator of high intelligibility, they can be used as a reliable screen for low intelligibility. This reliability can be maximized by careful attention to the protocol used in word list elicitation, transcription, and analysis. Additionally, lexical similarity percentages can be used to form a hypothesis for dialect groupings as well as provide information relevant to intelligibility testing site selection.</div>Admin