Difference between revisions of "Developing an SRT"
(11 intermediate revisions by the same user not shown) | |||
Line 23: | Line 23: | ||
'''Aim:''' get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult | '''Aim:''' get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult | ||
− | # '''Get informed consent''' &rArr | + | # '''Get informed consent''' ⇒ The people you work with at this stage will be providing spoken sentences that will ultimately end up being included in the final SRT. These sentences could very well then be played by to hundreds of people, often in a large geographic area that may even include more than one country. It is vital that, right at the start, the people who provide these sentences agree to their speech being used in this way. Make sure you get consent in a way that is culturally appropriate to the people you're working with. See section 5.1.9 in Decker & Grummitt (forthcoming) <ref>Decker, Ken and John Grummitt. (forthcoming). ''Understanding Language Choices: A guide to sociolinguistic assessment''. Dallas: Summer Institute of Linguistics.</ref> for detailed discussion of this issue. |
# '''Collect sample texts''' ⇒ In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity. | # '''Collect sample texts''' ⇒ In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity. | ||
# '''Extract a range of sentences''' ⇒ Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with [[Discourse Markers|discourse markers]] as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard. | # '''Extract a range of sentences''' ⇒ Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with [[Discourse Markers|discourse markers]] as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard. | ||
Line 42: | Line 42: | ||
'''Aim:''' start calibrating the test by testing the 50 or so speakers from the previous stage | '''Aim:''' start calibrating the test by testing the 50 or so speakers from the previous stage | ||
# '''Decide who does the testing''' ⇒ It is easier, particularly for relationships, if whoever has carried out the proficiency evaluation described above, also carries out the pilot testing of the SRT. For consistency, one person should score all the tests at this stage, and this person should be familiar enough with all the 40-50 preliminary test sentences to follow the transcripts at the same speed as the recording. | # '''Decide who does the testing''' ⇒ It is easier, particularly for relationships, if whoever has carried out the proficiency evaluation described above, also carries out the pilot testing of the SRT. For consistency, one person should score all the tests at this stage, and this person should be familiar enough with all the 40-50 preliminary test sentences to follow the transcripts at the same speed as the recording. | ||
− | # '''Get informed consent''' ⇒ Each person taking the preliminary test will need to understand the entire test process beforehand. This is the "informed" part of "informed consent." They have to know what they are consenting to. At this stage, they are consenting for their score data to be used only to help us construct a final test. It is recommended that you record the responses of a few people at each proficiency level and for about 10 random other participants. These recordings will be used to train the test administrators. The results and recordings of the preliminary test will not be used in our final reporting and not shared outside the research team. As you will have to explain this for each of the possibly 50 people you're working with, it's good to have this standardised to ensure consistency and that you don't leave anything out. | + | # '''Get [[informed consent]]''' ⇒ Each person taking the preliminary test will need to understand the entire test process beforehand. This is the "informed" part of "informed consent." They have to know what they are consenting to. At this stage, they are consenting for their score data to be used only to help us construct a final test. It is recommended that you record the responses of a few people at each proficiency level and for about 10 random other participants. These recordings will be used to train the test administrators. The results and recordings of the preliminary test will not be used in our final reporting and not shared outside the research team. As you will have to explain this for each of the possibly 50 people you're working with, it's good to have this standardised to ensure consistency and that you don't leave anything out. |
# '''Test high-level speakers first''' ⇒ It's often hardest to design an SRT that adequately discriminates among the higher levels of proficiency. So, if possible, it's best to start doing preliminary testing with the people you identified as having the highest proficiency in the language. As you carry out the test, allow them space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. As their scores are calculated, plot them as described in step 5 below. If you can't find a difference between these high-level speakers, your test sentences don't have enough difficult examples in them. Get a night's sleep first, and then go back to elicit sample sentences which are more difficult running through all the steps above once more. If you still don't have a test which distinguishes them, it might be that you've chosen the wrong proficiency scale or assigned these speakers to the wrong levels. By this point, you'll have realised how important it was to be thorough at all the earlier stages of the test. Go away for a long weekend, and then start again. | # '''Test high-level speakers first''' ⇒ It's often hardest to design an SRT that adequately discriminates among the higher levels of proficiency. So, if possible, it's best to start doing preliminary testing with the people you identified as having the highest proficiency in the language. As you carry out the test, allow them space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. As their scores are calculated, plot them as described in step 5 below. If you can't find a difference between these high-level speakers, your test sentences don't have enough difficult examples in them. Get a night's sleep first, and then go back to elicit sample sentences which are more difficult running through all the steps above once more. If you still don't have a test which distinguishes them, it might be that you've chosen the wrong proficiency scale or assigned these speakers to the wrong levels. By this point, you'll have realised how important it was to be thorough at all the earlier stages of the test. Go away for a long weekend, and then start again. | ||
# '''Test all the speakers''' ⇒ If step 3 above shows you are good to go, test all the 50 or so participants you have remembering to identify about 10 random participants to record and to record a few from each proficiency level. As before, allow each participant space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. | # '''Test all the speakers''' ⇒ If step 3 above shows you are good to go, test all the 50 or so participants you have remembering to identify about 10 random participants to record and to record a few from each proficiency level. As before, allow each participant space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. | ||
Line 75: | Line 75: | ||
'''Note''': This stage of SRT development is probably the most challenging technically. | '''Note''': This stage of SRT development is probably the most challenging technically. | ||
− | # '''Apply the discrimination index''' ⇒ | + | # '''Apply the discrimination index''' ⇒ We need to reduce the 50 or so sentences that we've been using for our preliminary test to the 15 we will use on the final test. To do this, we need some way of deciding which sentences are better than others for discriminating between levels. We use what Radloff (1991:55) calls the ''discrimination index''. Basically, the lower the value for the discrimination index, the better the sentence is for discriminating between levels. For the methodology needed to calculate discrimination indices for the sentences you are using in the preliminary test, see the [[Calculating Discrimination Index]] page. |
− | # '''Calculate difficulty levels''' ⇒ | + | # '''Calculate difficulty levels''' ⇒ see the [[Calculating Discrimination Index]] page. |
− | # '''Select your final 15''' ⇒; | + | # '''Select your final 15''' ⇒; see the [[Calculating Discrimination Index]] page. |
− | # ''' | + | # '''Extract the final form score''' ⇒ see the [[Calculating Discrimination Index]] page. |
− | # '''Record the final test''' ⇒ | + | # '''Record the final test''' ⇒ use some software to extract your 15 sentences from the 50 or so pilot test sentences you started with to create a new master recording for the final test. At the start of this master recording you'll need to include three practice sentences which should have low DIs and low difficulty levels. Make the first practice sentence short and the next longer and the last longest. Make sure your last practice sentence is as long as your first test sentence. |
− | # '''Calibrate the test''' ⇒ | + | # '''Calibrate the test''' ⇒ You now have two sets of scores. From the [[Developing an SRT#Evaluating L2 Speakers|Evaluating L2 Speakers]] section above, you have proficiency level scores for your participants. You also have just calculated their scores for the 15 sentences you have selected. What you need to do for this step is to correlate these two sets of scores so that when you have a SRT score, you know what level of proficiency it relates to. The frist stage of this is to plot the participants' proficiency levels against their SRT pilot test scores for the 15 sentences in a scatter graph. Use y-axis for the proficiency level and the y-axis for the SRT scores. As you plot this, you'll be able to see what proficiency levels the scores indicate just by looking at the graph. But to improve the accuracy of the test, we can apply some statistical techniques to refine this even further. These three statistical techniques form the next three steps here... |
− | # '''Calculate the line of | + | # '''Calculate the line of estimation''' ⇒ See the [[Statistical Refinement of an SRT]] page. |
− | # '''Calculate the standard | + | # '''Calculate the standard error''' ⇒ See the [[Statistical Refinement of an SRT]] page. |
− | # '''Calculate the coefficient of correlation''' ⇒ | + | # '''Calculate the coefficient of correlation''' ⇒ See the [[Statistical Refinement of an SRT]] page. |
− | # '''Control test the final test''' ⇒ | + | # '''Control test the final test''' ⇒ to do this, simply administer the test with your 15 sentences to mother-tongue speakers of the test language. The group of participants chosen for this should come from a wide range of backgrounds and education levels so that you can prevent any variables such as these influencing the results. Obviously, as these are native speakers, you should be looking at test results approaching 100% with participants consistently achieving scores that place them in the highest proficiency level. Obviously, if you do not, you might need to look at the development of your test so far. |
+ | |||
+ | The next phase of SRTs is to learn how to [[Administering an SRT|administer them]]. | ||
+ | |||
==References== | ==References== | ||
<references/> | <references/> | ||
+ | [[Category:Sentence_Repetition_Testing]] |
Latest revision as of 17:24, 12 July 2011
Data Collection Tools | |
---|---|
Interviews | |
Observation | |
Questionnaires | |
Recorded Text Testing | |
Sentence Repetition Testing | |
Word Lists | |
Participatory Methods | |
Matched-Guise |
Sentence Repetition Tests | |
---|---|
Developing an SRT | |
Administering an SRT | |
Analysing SRT Data |
Contents
Introduction
Radloff (1991:37-38) <ref>Radloff, Carla F. (1991). Sentence Repetition Testing for Studies of Community Bilingualism. Dallas: Summer Institute of Linguistics.</ref> reminds us of how much depends on us taking particular care at this stage of working with SRTs. The process of creating an SRT is time-consuming and may seem over-elaborate. But we are creating a tool which will help to determine the linguistic future of entire speech communities and we should bear this in mind no matter what tools we are developing or administering.
We should pay particular attention at the development stage to finding the right personnel. This includes finding speakers of the test language who have the right level of education to contribute to test development. Taking time to rate participants for proficiency is also worth doing. And making sure our transcription of the sentences that we include in the test development is also important.
We want the most accurate results possible from our tools and so, we should be willing to be as thorough as we can be with our development of them.
Preliminaries
It's best to develop the test in an location where the test language is a common LWC and where you can get good access to contacts to help you develop the test.
Radloff recommends including the following personnel:
- The Researcher: er... that's you! You don't need to have much proficiency at all in the language of the test. But the more you have, the easier the development and administration of the test will be. You don't need to spend weeks living in the community prior to starting development but you could read any materials that other workers in that language have produced such as grammar notes or descriptions of phonology, etc.
- Educated Mother-Tongue Speakers: You'll need at least three of these. They elicit the initial sentences, select those to be used in the test and help develop the scoring system. It's helpful that these people have some education because you'll want to be testing in a standard form of the language and this is usually acquired through education in that language. Education also ensures that these helpers have enough ability in the test language to be able to construct a test in it. It may well be that there is no formal education in the LWC for the people you are working with. In this case, consider education to equal experience and select assistants who obviously display a high level of ability in the language and who aren't challenged by the construction of such a test.
- Second Language Speakers: You'll need a number of these at each proficiency level in the test language so you can calibrate the test. Radloff recommends looking to these social groupings for such a pool of speakers: a local college, a business, a neighbourhood, an organisation, etc.
- Test Administrator: We talk more about training the test administrator on our Administering an SRT page. Of course, the researcher could also administer the test but, to do so, they should be familiar to some extent with the test language. The administrator and the researcher need to have a language in common to carry out the training and communicate the results.
Recording equipment is obviously necessary along with enough headphones for a participant, the researcher and the administrator (if these are different people). In order for everyone to hear the recording at the same time, two Y-adaptors are needed. It is vital that at each stage of data collection and ordering that you backup your recordings and sentence collection and that you keep these backup files on a separate machine/server to your data.
Eliciting Sample Sentences
Aim: get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult
- Get informed consent ⇒ The people you work with at this stage will be providing spoken sentences that will ultimately end up being included in the final SRT. These sentences could very well then be played by to hundreds of people, often in a large geographic area that may even include more than one country. It is vital that, right at the start, the people who provide these sentences agree to their speech being used in this way. Make sure you get consent in a way that is culturally appropriate to the people you're working with. See section 5.1.9 in Decker & Grummitt (forthcoming) <ref>Decker, Ken and John Grummitt. (forthcoming). Understanding Language Choices: A guide to sociolinguistic assessment. Dallas: Summer Institute of Linguistics.</ref> for detailed discussion of this issue.
- Collect sample texts ⇒ In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity.
- Extract a range of sentences ⇒ Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with discourse markers as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard.
- Get the sentences written down ⇒ Having chosen your sentences, ask the second language speaker to write the sentences down in the local script. You should transcribed the sentences you've chosen phonetically using IPA so that you (and your research team) can follow these sample sentences in later stages of test development and administration.
- Record the sentences ⇒ Ask the speaker to speak each sentence at natural speed and record them. Asking them to pause for a couple of seconds before and after they speak each sentence will help separate questions when it comes to actually constructing the test later. If you're good at using software like Audacity though, to insert blank spaces and compile audio tests like SRTs, this will be less of an issue.
- Get a second opinion ⇒ You should then find two other educated second language speakers (preferably one male and one female) to read through all the sentences collected and judge them for suitability. Sentences which are too political, opinionated or otherwise controversial should be eliminated as well as any that simply 'sound funny.' Try to eliminate any that are too similar to each other too.
- Refine the selection ⇒ Take one of these latter helpers at a time and play the remaining sentence recordings to them one at a time asking them to repeat them after they've heard them once. Use your phonetic transcriptions to check their repetitions carefully. Remove any sentences which neither helper can repeat because they are too long. Sentences they can almost repeat are good ones for helping you judge the upper limits of proficiency. Finally, have the whole research team look over the 40-50 sentences that should remain. If you need more or there are too many sentences focussed on one topic or are repetitious, you'll need to go back to step 1 in this section and elicit some more sentences. Have a cup of tea first though. You deserve it for getting this far!
- Make a preliminary test ⇒ Order all the 40-50 sentences from shortest to longest. Consider the three shortest as practice sentences. Get a mother-tongue speaker of the test language (and dialect!) to record these. A man's voice is usually more widely acceptable than a woman's voice. Either while you record or later, put about a three second gap between each sentence. Each sentence should appear only once in the test but you might want to record a few samples of each sentence until you have a natural, usable sample of each.
- Transcribe the test ⇒ Refine or re-do your transcription of the preliminary test sentences and check this with a speaker of the test language for accuracy. Ask them also to give a word-for-word translation of each of the sentences (we'll be referring to this later). Also do a free translation of each sentence. Optionally, you could add the local script version of the sentence too.
Evaluating L2 Speakers
Aim: assess the proficiency levels of the 50 or so speakers who will give you your range of proficiency levels
- Select a proficiency scale ⇒ In order for the scores of the preliminary test subjects to mean anything, you need to have a standard language proficiency scale to compare them to. There are lots of proficiency scales so it's best to choose one that is most practical for the situation you're working in. The ILR Scale is described on the SurveyWiki. Other scales include the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), DILF/DELF/DALF (French) and IELTS. Wikipedia has an excellent table comparing a number of proficiency scales. However, it is important that with whatever scale you choose, you use the assessment for spoken proficiency. Many of these scales give a final rating which includes all four language skills. For an SRT, it is spoken proficiency that you need to focus on. The IELTS Speaking test band descriptors are probably the most detailed available publicly.
- Select 50 speakers ⇒ it is important that you select the people to trial the preliminary version of the test through the social networks that you have access to where you are working. Subjects need to be able to trust you and if there is a connection through a mutual contact, this helps. From the start, you should communicate clearly and honestly about the purpose of wanting their participation. Ultimately, you are not evaluating their language proficiency, they are helping to create a test to use to collect data from others.
- Assess each participant ⇒ It is important that you are able to identify an appropriate number of people at each level of the proficiency scale you are using. How many people will depend on how many levels you have and how specific the descriptors are for each level. It can be particularly hard to get enough people scoring at the extreme ends of the scale. If you don't get people for the highest and lowest levels, you won't be able to use the SRT to make any statements either higher or lower than the levels that you have speakers for, and this will reduce the capability of your SRT to give you a full picture of the language ability of the communities you use it with.
Pilot Testing
Aim: start calibrating the test by testing the 50 or so speakers from the previous stage
- Decide who does the testing ⇒ It is easier, particularly for relationships, if whoever has carried out the proficiency evaluation described above, also carries out the pilot testing of the SRT. For consistency, one person should score all the tests at this stage, and this person should be familiar enough with all the 40-50 preliminary test sentences to follow the transcripts at the same speed as the recording.
- Get informed consent ⇒ Each person taking the preliminary test will need to understand the entire test process beforehand. This is the "informed" part of "informed consent." They have to know what they are consenting to. At this stage, they are consenting for their score data to be used only to help us construct a final test. It is recommended that you record the responses of a few people at each proficiency level and for about 10 random other participants. These recordings will be used to train the test administrators. The results and recordings of the preliminary test will not be used in our final reporting and not shared outside the research team. As you will have to explain this for each of the possibly 50 people you're working with, it's good to have this standardised to ensure consistency and that you don't leave anything out.
- Test high-level speakers first ⇒ It's often hardest to design an SRT that adequately discriminates among the higher levels of proficiency. So, if possible, it's best to start doing preliminary testing with the people you identified as having the highest proficiency in the language. As you carry out the test, allow them space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. As their scores are calculated, plot them as described in step 5 below. If you can't find a difference between these high-level speakers, your test sentences don't have enough difficult examples in them. Get a night's sleep first, and then go back to elicit sample sentences which are more difficult running through all the steps above once more. If you still don't have a test which distinguishes them, it might be that you've chosen the wrong proficiency scale or assigned these speakers to the wrong levels. By this point, you'll have realised how important it was to be thorough at all the earlier stages of the test. Go away for a long weekend, and then start again.
- Test all the speakers ⇒ If step 3 above shows you are good to go, test all the 50 or so participants you have remembering to identify about 10 random participants to record and to record a few from each proficiency level. As before, allow each participant space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well.
- Score the test responses ⇒ The first task here is to define what an error is. Scoring an SRT involves rating each sentence from 0-3 points as follows:
Score Description 3 points perfect, no errors in sentence 2 points one error in sentence 1 point two errors in sentence 0 point three or more errors in sentence
- It is also helpful to know what types of errors have been made. (although I have to say I'm not sure why - can anyone clarify by adding a clause in here to that effect) Radloff (1991:54) suggests the following:
Score Description o word omitted from sentence s word substituted for another > or < any change of word order (counts as one error) ~~ word garbled so as to lose meaning + word or phrase added to sentence R word or phrase repeated (counts as one error) W wrong word or word ending (grammatical error)
- As each participant's results are scored, plot them on a graph against the proficiency levels that you previously assigned them.
Refining the Test
Aim: select the final 15 sentences which will form the SRT and match SRT scores to proficiency levels
Note: This stage of SRT development is probably the most challenging technically.
- Apply the discrimination index ⇒ We need to reduce the 50 or so sentences that we've been using for our preliminary test to the 15 we will use on the final test. To do this, we need some way of deciding which sentences are better than others for discriminating between levels. We use what Radloff (1991:55) calls the discrimination index. Basically, the lower the value for the discrimination index, the better the sentence is for discriminating between levels. For the methodology needed to calculate discrimination indices for the sentences you are using in the preliminary test, see the Calculating Discrimination Index page.
- Calculate difficulty levels ⇒ see the Calculating Discrimination Index page.
- Select your final 15 ⇒; see the Calculating Discrimination Index page.
- Extract the final form score ⇒ see the Calculating Discrimination Index page.
- Record the final test ⇒ use some software to extract your 15 sentences from the 50 or so pilot test sentences you started with to create a new master recording for the final test. At the start of this master recording you'll need to include three practice sentences which should have low DIs and low difficulty levels. Make the first practice sentence short and the next longer and the last longest. Make sure your last practice sentence is as long as your first test sentence.
- Calibrate the test ⇒ You now have two sets of scores. From the Evaluating L2 Speakers section above, you have proficiency level scores for your participants. You also have just calculated their scores for the 15 sentences you have selected. What you need to do for this step is to correlate these two sets of scores so that when you have a SRT score, you know what level of proficiency it relates to. The frist stage of this is to plot the participants' proficiency levels against their SRT pilot test scores for the 15 sentences in a scatter graph. Use y-axis for the proficiency level and the y-axis for the SRT scores. As you plot this, you'll be able to see what proficiency levels the scores indicate just by looking at the graph. But to improve the accuracy of the test, we can apply some statistical techniques to refine this even further. These three statistical techniques form the next three steps here...
- Calculate the line of estimation ⇒ See the Statistical Refinement of an SRT page.
- Calculate the standard error ⇒ See the Statistical Refinement of an SRT page.
- Calculate the coefficient of correlation ⇒ See the Statistical Refinement of an SRT page.
- Control test the final test ⇒ to do this, simply administer the test with your 15 sentences to mother-tongue speakers of the test language. The group of participants chosen for this should come from a wide range of backgrounds and education levels so that you can prevent any variables such as these influencing the results. Obviously, as these are native speakers, you should be looking at test results approaching 100% with participants consistently achieving scores that place them in the highest proficiency level. Obviously, if you do not, you might need to look at the development of your test so far.
The next phase of SRTs is to learn how to administer them.
References
<references/>