With all research that attempts to make generalisations about a community, it is impossible to administer the tool to each member of the community without vast amounts of time and effort. Therefore, we need to be able to select participants who will provide data which will form a good basis for generalising findings from.
To do this, we are looking for two things.
- we need participants who are representative of the population that we want to find out about. We may not be wanting to know about the whole population. For example, we might only want to know about women in a particular age range. Thus we need to know in advance whose language we are assessing.
- we need enough participants to smooth out any interference in the data from errors on our part or circumstances beyond our control (e.g. a participant who has been up all night arguing with her husband!). The number of participants we need will vary depending on the total population we want information about and how homogenous this population is. The more variety there is in a population, the more participants we will need to gather data from to generalise about the whole community.
Here are some of the main kinds of sampling used in surveying, presented loosely from Wetherill (1995).<ref>Wetherill, G. Barrie. 1995. Research Design and Analysis. Dallas: Summer Institute of Linguistics.</ref>
- Random sample – every member of the population is equally likely to be chosen (the aim is to get a representative picture of the whole population). This take a surprisingly great amount of time, effort, and knowledge to do right. Because of how little we might know about the places we’ll be surveying, it is often difficult to get enough demographic information to take a random sample.
- Simple random sampling – take a frame and use a random number generator to pick individuals for the sample. Every member of the population is equally likely to be chosen.
- Stratification – division of population into different demographic groups (strata) before sampling. For instance, [gender: male and female] and [age: young, middle age, old] gives six strata.
- Stratified random sample – divide the population into strata and then take a simple random sample within each strata. This can give us a picture of portions of the populations that we care about. For instance, we may want to know not just if the general population understands a particular language but also if the women, elderly, poor, uneducated, etc. know the language.
- (This can give you a representative picture of pieces of the population, which you can compare with one another to find differences or similarities. Because there are different proportions of each strata in the population, it does not give an accurate picture of the population as a whole. If you do know the proportions of these strata in the population, which is unlikely in our context, you can weigh the data to make it a more representative picture of the whole population.)
- Systematic sampling – example: order the houses in a particular village and go along that route to survey every fifth house (then ask for a certain kind of individual to survey). This is still not a random sample; there could be a pattern the houses follow that might skew your results. Because it may be difficult to make a good frame (lack of population information), this may be a practical middle ground.
- Cluster sampling – Divide the population into clusters (like city blocks or groups of houses) and take a random sampling of the clusters. Survey each individual in the cluster (or one individual from each house in the cluster). This uses random sampling but can create some problems because people often live in neighborhoods where individuals share certain characteristics. However, this requires much less travel and adjustment (like finding new people to help you in each area).
- Quota sample – population is divided into strata and then interviewers are given “quotas” to fill within the various categories. Then the interviewers would choose the individuals to survey as they see fit (this is not random).
Note: Cases are individuals.