Lexicostatistics is the comparison of the lexicons of different sets of language data to assess the amount of similarity. It often intends to suggest historical relationships between the varieties, particlularly in the study of glottochronology.

Lexicostatistics does have some limitations you'll need to be aware of however:

  • it should not be used to give specific dates of divergence of languages or dialects. The more rigorous Comparative Method can be used for that if needed. Tree diagrams of relatedness are often used in representing the findings of the Comparative Method but these should be drawn on the basis of lexicostatistics data because it does not give us enough information to construct linguistic relatedness in such detail.
  • lexicostatistics will provide you with percentages of lexical similarity, but these percentages have no value in themselves as exact values of similarity between one language and another. If you're comparing languages A and B with language Z, just because you arrive at a figure of 68% between A and Z, you cannot say that B is more intelligible to Z speakers if it shows 69% lexical similarity. That 1% difference is unlikely to be accurate with the tiny samples of each language that we have to work on. To calculate intelligibility, more accurate methods of intelligibilty testing should be used.

Because of these limitations, lexicostatistics should only be used to indicate the lack of intelligibility and nothing more. SIL recommends that, if lexical similarity is below 70%, you can conclude that there is lack of intelligibility between the two varieties.