The information by Nahhas about borrowings
by Ramzi Nahhas
If you are interested in the inherent linguistic similarity between two languages, then when comparing the lexical similarity of two language varieties, all words borrowed from other languages are irrelevant. All borrowings should be dropped (i.e. treated as “no data”) since it is not clear what the native word for that item would be.
If, however, you are going to use a lexical similarity percentage to infer current intelligibility then borrowings might indeed be relevant. In this view, it is the similarity (or lack thereof) of the words that matters regardless of where the words came from originally. If one variety borrows a word that is very different from the native word and the other does not, then the effect of this borrowing should be to lower intelligibility, right? If this were so, then the effect of the borrowing on the lexical similarity percentage would be the same as its effect on intelligibility.
But wait! What if, due to bilingualism in the language of borrowing, the borrowed word is understood by some speakers of the other variety? This implies that the effect of the borrowing on the lexical similarity percentage (it is increased) might not be the same as its effect on intelligibility (it might be increased or decreased). In this case it would be invalid to infer intelligibility from lexical similarity.
What are we to do? In order to handle borrowings, I want to first give a more specific definition of “borrowing”. Keep in mind that probably every language that is not totally isolated from contact (are there any such languages left?) will have borrowings, some ancient and some recent. For example, Thai is full of words from Sanskrit and Pali. Should these be ignored when comparing Thai to another language? It depends on your purpose: For historical reconstruction, yes; For tracing contact, no; For screening for intelligibility, no... bilingualism is not a factor for ancient borrowings since probably no one speaks those languages anymore.
Suppose that you would like to keep borrowings. It is clear that some borrowings should be kept and others should be dropped. Specifically, you should keep all ancient borrowings and drop all recent15 borrowings, unless both varieties have borrowed the same word. Recent borrowings should count neither for nor against lexical similarity when the goal is to screen for low intelligibility; including them for the reason that they affect intelligibility is problematic since the direction of their effect depends on bilingualism. An exception to this is when both varieties have borrowed the same word. In that case, regardless of bilingualism, that word will be understood by speakers of both groups. The effect on intelligibility is clear. But if only one has borrowed, or they have borrowed different words, then the direction of the effect on intelligibility depends on bilingualism.
This would lead to the following method for handling borrowings when using a lexical similarity percentage to infer current intelligibility:
- Define a “recent borrowing” to be a borrowing from a language that is still in use. Speakers of other varieties might have learned the language borrowed from. If you drop ancient borrowings, you might lose half the language! By “recent” here I just mean borrowings from languages that are still in use.
- Define an “ancient borrowing” to be a borrowing from an extinct language (e.g. Sanskrit). Speakers of other varieties most likely have not learned the language borrowed from.
- Keep all ancient borrowings. Keep them regardless of if only one variety borrowed, if both borrowed the same word, or if they each borrowed a different word. The effect on intelligibility is not likely to be influenced by knowing the language that is borrowed from. Thus, their effect on lexical similarity will be consistent with their effect on intelligibility. Treat ancient borrowings as if they are native words.
- Whether or not to keep recent borrowings depends on the situation:
- If Varieties A and B have borrowed the same (recent) word, then KEEP the borrowings. The effect on intelligibility is clear (it is increased).
- If Varieties A and B have borrowed different recent words OR if only one of the two varieties has a recent borrowing, then DROP the borrowing(s). The effect on intelligibility is NOT clear; it depends on the bilingual proficiency of individuals. It might be increased or decreased.
This is a reasonable method if you really only want to keep recent borrowings when their effect on intelligibility is clear. But, since the only time you would keep recent borrowings would be when they are the same word, the effect of this method would be generally to lead to a positive bias on the lexical similarity percentage. This seems rather silly. Trying to keep recent borrowings only when their effect on intelligibility is clear results in a silly, ridiculously complicated method.
In summary, ancient borrowings are not a problem since their effect on the lexical similarity percentage is the same as their effect on intelligibility. However, recent borrowings are problematic since their effect on intelligibility is unclear (it depends on bilingualism). Attempting to only keep them in cases where their effect is clear leads to a silly, complicated procedure. Therefore, I conclude that, if you are interested in inferring current intelligibility, you should drop16 all recent borrowings and keep all ancient borrowings.