Difference between revisions of "Developing an SRT"

From SurveyWiki
Jump to navigationJump to search
Line 23: Line 23:
 
  '''Aim:''' get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult
 
  '''Aim:''' get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult
  
 +
# '''Get informed consent''' &rArr: The people you work with at this stage will be providing spoken sentences that will ultimately end up being included in the final SRT. These sentences could very well then be played by to hundreds of people, often in a large geographic area that may even include more than one country. It is vital that, right at the start, the people who provide these sentences agree to their speech being used in this way. Make sure you get consent in a way that is culturally appropriate to the people you're working with. See section 5.1.9 in Decker & Grummitt (forthcoming) <ref>Decker, Ken and John Grummitt. (forthcoming). ''Understanding Language Choices: A guide to sociolinguistic assessment''. Dallas: Summer Institute of Linguistics.</ref> for detailed discussion of this issue.
 
# '''Collect sample texts''' &rArr; In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity.
 
# '''Collect sample texts''' &rArr; In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity.
 
# '''Extract a range of sentences''' &rArr; Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with [[Discourse Markers|discourse markers]] as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard.
 
# '''Extract a range of sentences''' &rArr; Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with [[Discourse Markers|discourse markers]] as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard.
Line 34: Line 35:
 
==Evaluating L2 Speakers==
 
==Evaluating L2 Speakers==
 
  '''Aim:''' assess the proficiency levels of the 50 or so speakers who will give you your range of proficiency levels
 
  '''Aim:''' assess the proficiency levels of the 50 or so speakers who will give you your range of proficiency levels
# '''Select a proficiency scale''' &rArr;  
+
# '''Select a proficiency scale''' &rArr; In order for the scores of the preliminary test subjects to mean anything, you need to have a standard language proficiency scale to compare them to. There are lots of proficiency scales so it's best to choose one that is most practical for the situation you're working in. The [[ILR Scale]] is described on the SurveyWiki. Other scales include the [http://www.coe.int/T/DG4/Portfolio/?M=/main_pages/levels.html Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR)], [http://www.ciep.fr/delfdalf/index.php DILF/DELF/DALF] (French) and [http://www.ielts.lang-courses.com/recommendations-speaking/ielts-speaking/184-ielts-speaking-band-descriptors-public-version.html IELTS]. Wikipedia has [http://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages#Self-evaluated_equivalences_to_CEFR_levels an excellent table] comparing a number of proficiency scales. However, it is important that with whatever scale you choose, you use the assessment for spoken proficiency. Many of these scales give a final rating which includes all four language skills. For an SRT, it is spoken proficiency that you need to focus on. The [http://www.ielts.lang-courses.com/recommendations-speaking/ielts-speaking/184-ielts-speaking-band-descriptors-public-version.html IELTS Speaking test band descriptors] are probably the most detailed available publicly.
# '''Select 50 speakers''' &rArr; see pages 52-3
+
# '''Select 50 speakers''' &rArr; it is important that you select the people to trial the preliminary version of the test through the social networks that you have access to where you are working. Subjects need to be able to trust you and if there is a connection through a mutual contact, this helps. From the start, you should communicate clearly and honestly about the purpose of wanting their participation. Ultimately, you are not evaluating their language proficiency, they are helping to create a test to use to collect data from others.
# '''Assess each participant''' &rArr;
+
# '''Assess each participant''' &rArr; It is important that you are able to identify an appropriate number of people at each level of the proficiency scale you are using. How many people will depend on how many levels you have and how specific the descriptors are for each level. It can be particularly hard to get enough people scoring at the extreme ends of the scale. If you don't get people for the highest and lowest levels, you won't be able to use the SRT to make any statements either higher or lower than the levels that you have speakers for, and this will reduce the capability of your SRT to give you a full picture of the language ability of the communities you use it with.
  
 
==Pilot Testing==
 
==Pilot Testing==
 
  '''Aim:''' start calibrating the test by testing the 50 or so speakers from the previous stage
 
  '''Aim:''' start calibrating the test by testing the 50 or so speakers from the previous stage
# '''Decide who does the testing''' &rArr;
+
# '''Decide who does the testing''' &rArr; It is easier, particularly for relationships, if whoever has carried out the proficiency evaluation described above, also carries out the pilot testing of the SRT. For consistency, one person should score all the tests at this stage, and this person should be familiar enough with all the 40-50 preliminary test sentences to follow the transcripts at the same speed as the recording.
# '''Get consent''' &rArr; include here explaining what the test is for and the procedure from p52-3
+
# '''Get informed consent''' &rArr; Each person taking the preliminary test will need to understand the entire test process beforehand. This is the "informed" part of "informed consent." They have to know what they are consenting to. At this stage, they are consenting for their score data to be used only to help us construct a final test. It is recommended that you record the responses of a few people at each proficiency level and for about 10 random other participants. These recordings will be used to train the test administrators. The results and recordings of the preliminary test will not be used in our final reporting and not shared outside the research team. As you will have to explain this for each of the possibly 50 people you're working with, it's good to have this standardised to ensure consistency and that you don't leave anything out.
# '''Test high-level speakers first''' &rArr;
+
# '''Test high-level speakers first''' &rArr; It's often hardest to design an SRT that adequately discriminates among the higher levels of proficiency. So, if possible, it's best to start doing preliminary testing with the people you identified as having the highest proficiency in the language. As you carry out the test, allow them space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. As their scores are calculated, plot them as described in step 5 below. If you can't find a difference between these high-level speakers, your test sentences don't have enough difficult examples in them. Get a night's sleep first, and then go back to elicit sample sentences which are more difficult running through all the steps above once more. If you still don't have a test which distinguishes them, it might be that you've chosen the wrong proficiency scale or assigned these speakers to the wrong levels. By this point, you'll have realised how important it was to be thorough at all the earlier stages of the test. Go away for a long weekend, and then start again.
# '''Test all the speakers''' &rArr;
+
# '''Test all the speakers''' &rArr; If step 3 above shows you are good to go, test all the 50 or so participants you have remembering to identify about 10 random participants to record and to record a few from each proficiency level. As before, allow each participant space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well.
# '''Score the test responses''' &rArr;
+
# '''Score the test responses''' &rArr; The first task here is to define what an error is. Scoring an SRT involves rating each sentence from 0-3 points as follows:
 +
::{| class="wikitable"
 +
! Score !! Description
 +
{{Alternating rows table section|es=background:#CFECEC;
 +
| {{!}} 3 points {{!!}} perfect, no errors in sentence
 +
| {{!}} 2 points {{!!}} one error in sentence
 +
| {{!}} 1 point {{!!}} two errors in sentence
 +
| {{!}} 0 point {{!!}} three or more errors in sentence
 +
}}
 +
|}
 +
::It is also helpful to know what types of errors have been made. (although I have to say I'm not sure why - can anyone clarify by adding a clause in here to that effect) Radloff (1991:54) suggests the following:
 +
::{| class="wikitable"
 +
! Score !! Description
 +
{{Alternating rows table section|es=background:#CFECEC;
 +
| {{!}} o {{!!}} word omitted from sentence
 +
| {{!}} s {{!!}} word substituted for another
 +
| {{!}} > or < {{!!}} any change of word order (counts as one error)
 +
| {{!}} ~~ {{!!}} word garbled so as to lose meaning
 +
| {{!}} + {{!!}} word or phrase added to sentence
 +
| {{!}} R {{!!}} word or phrase repeated (counts as one error)
 +
| {{!}} W {{!!}} wrong word or word ending (grammatical error)
 +
}}
 +
|}
 +
::As each participant's results are scored, plot them on a graph against the proficiency levels that you previously assigned them.
  
 
==Refining the Test==
 
==Refining the Test==
 
  '''Aim:''' select the final 15 sentences which will form the SRT and match SRT scores to proficiency levels
 
  '''Aim:''' select the final 15 sentences which will form the SRT and match SRT scores to proficiency levels
# '''STEP''' $rArr;
 
# '''STEP''' $rArr;
 
  
 +
'''Note''': This stage of SRT development is probably the most challenging technically.
 +
 +
# '''Apply the discrimination index''' &rArr;
 +
# '''Calculate difficulty levels''' &rArr;
 +
# '''Select your final 15''' &rArr;;
 +
# '''Calculate the final score''' &rArr;
 +
# '''Record the final test''' &rArr;
 +
# '''Calibrate the test''' &rArr;
 +
# '''Calculate the line of regression''' &rArr;
 +
# '''Calculate the standard  error''' &rArr;
 +
# '''Calculate the coefficient of correlation''' &rArr;
 +
# '''Control test the final test''' &rArr;
 
==References==
 
==References==
 
<references/>
 
<references/>

Revision as of 19:11, 12 April 2011

Data Collection Tools
Tools.png
Interviews
Observation
Questionnaires
Recorded Text Testing
Sentence Repetition Testing
Word Lists
Participatory Methods
Matched-Guise
Sentence Repetition Tests
Developing an SRT
Administering an SRT
Analysing SRT Data

Introduction

Radloff (1991:37-38) <ref>Radloff, Carla F. (1991). Sentence Repetition Testing for Studies of Community Bilingualism. Dallas: Summer Institute of Linguistics.</ref> reminds us of how much depends on us taking particular care at this stage of working with SRTs. The process of creating an SRT is time-consuming and may seem over-elaborate. But we are creating a tool which will help to determine the linguistic future of entire speech communities and we should bear this in mind no matter what tools we are developing or administering.

We should pay particular attention at the development stage to finding the right personnel. This includes finding speakers of the test language who have the right level of education to contribute to test development. Taking time to rate participants for proficiency is also worth doing. And making sure our transcription of the sentences that we include in the test development is also important.

We want the most accurate results possible from our tools and so, we should be willing to be as thorough as we can be with our development of them.

Preliminaries

It's best to develop the test in an location where the test language is a common LWC and where you can get good access to contacts to help you develop the test.

Radloff recommends including the following personnel:

  • The Researcher: er... that's you! You don't need to have much proficiency at all in the language of the test. But the more you have, the easier the development and administration of the test will be. You don't need to spend weeks living in the community prior to starting development but you could read any materials that other workers in that language have produced such as grammar notes or descriptions of phonology, etc.
  • Educated Mother-Tongue Speakers: You'll need at least three of these. They elicit the initial sentences, select those to be used in the test and help develop the scoring system. It's helpful that these people have some education because you'll want to be testing in a standard form of the language and this is usually acquired through education in that language. Education also ensures that these helpers have enough ability in the test language to be able to construct a test in it. It may well be that there is no formal education in the LWC for the people you are working with. In this case, consider education to equal experience and select assistants who obviously display a high level of ability in the language and who aren't challenged by the construction of such a test.
  • Second Language Speakers: You'll need a number of these at each proficiency level in the test language so you can calibrate the test. Radloff recommends looking to these social groupings for such a pool of speakers: a local college, a business, a neighbourhood, an organisation, etc.
  • Test Administrator: We talk more about training the test administrator on our Administering an SRT page. Of course, the researcher could also administer the test but, to do so, they should be familiar to some extent with the test language. The administrator and the researcher need to have a language in common to carry out the training and communicate the results.

Recording equipment is obviously necessary along with enough headphones for a participant, the researcher and the administrator (if these are different people). In order for everyone to hear the recording at the same time, two Y-adaptors are needed. It is vital that at each stage of data collection and ordering that you backup your recordings and sentence collection and that you keep these backup files on a separate machine/server to your data.

Eliciting Sample Sentences

Aim: get a wide range of natural well-formed sample sentences (40-50) that range from simple to very difficult
  1. Get informed consent &rArr: The people you work with at this stage will be providing spoken sentences that will ultimately end up being included in the final SRT. These sentences could very well then be played by to hundreds of people, often in a large geographic area that may even include more than one country. It is vital that, right at the start, the people who provide these sentences agree to their speech being used in this way. Make sure you get consent in a way that is culturally appropriate to the people you're working with. See section 5.1.9 in Decker & Grummitt (forthcoming) <ref>Decker, Ken and John Grummitt. (forthcoming). Understanding Language Choices: A guide to sociolinguistic assessment. Dallas: Summer Institute of Linguistics.</ref> for detailed discussion of this issue.
  2. Collect sample texts ⇒ In order to achieve this, you'll have to elicit or find a sample text that has a wide range of language in it. This could be spoken or written, but if it's written, it needs to be natural language. You could ask your second language speaker to respond to a topic you come up with or give a description of something. Or, you could actually ask them to form sentences on a topic which will include specific grammatical constructions of increasing complexity.
  3. Extract a range of sentences ⇒ Once you have your text/s, you'll need to go through it and extract a range of sentences (not questions!) from simple through medium to difficult. There should be 60-70 of these. Difficulty could be a matter of grammatical complexity or it could also be other features such as the level of formality. In many languages, these are related. One tip that Radloff gives is to select some sentences that begin with discourse markers as these tend to be more challenging. Make sure you have a wide variety of content/topics. If you don't, participants may be able to provide answers which are based on previous sentences they've heard.
  4. Get the sentences written down ⇒ Having chosen your sentences, ask the second language speaker to write the sentences down in the local script. You should transcribed the sentences you've chosen phonetically using IPA so that you (and your research team) can follow these sample sentences in later stages of test development and administration.
  5. Record the sentences ⇒ Ask the speaker to speak each sentence at natural speed and record them. Asking them to pause for a couple of seconds before and after they speak each sentence will help separate questions when it comes to actually constructing the test later. If you're good at using software like Audacity though, to insert blank spaces and compile audio tests like SRTs, this will be less of an issue.
  6. Get a second opinion ⇒ You should then find two other educated second language speakers (preferably one male and one female) to read through all the sentences collected and judge them for suitability. Sentences which are too political, opinionated or otherwise controversial should be eliminated as well as any that simply 'sound funny.' Try to eliminate any that are too similar to each other too.
  7. Refine the selection ⇒ Take one of these latter helpers at a time and play the remaining sentence recordings to them one at a time asking them to repeat them after they've heard them once. Use your phonetic transcriptions to check their repetitions carefully. Remove any sentences which neither helper can repeat because they are too long. Sentences they can almost repeat are good ones for helping you judge the upper limits of proficiency. Finally, have the whole research team look over the 40-50 sentences that should remain. If you need more or there are too many sentences focussed on one topic or are repetitious, you'll need to go back to step 1 in this section and elicit some more sentences. Have a cup of tea first though. You deserve it for getting this far!
  8. Make a preliminary test ⇒ Order all the 40-50 sentences from shortest to longest. Consider the three shortest as practice sentences. Get a mother-tongue speaker of the test language (and dialect!) to record these. A man's voice is usually more widely acceptable than a woman's voice. Either while you record or later, put about a three second gap between each sentence. Each sentence should appear only once in the test but you might want to record a few samples of each sentence until you have a natural, usable sample of each.
  9. Transcribe the test ⇒ Refine or re-do your transcription of the preliminary test sentences and check this with a speaker of the test language for accuracy. Ask them also to give a word-for-word translation of each of the sentences (we'll be referring to this later). Also do a free translation of each sentence. Optionally, you could add the local script version of the sentence too.

Evaluating L2 Speakers

Aim: assess the proficiency levels of the 50 or so speakers who will give you your range of proficiency levels
  1. Select a proficiency scale ⇒ In order for the scores of the preliminary test subjects to mean anything, you need to have a standard language proficiency scale to compare them to. There are lots of proficiency scales so it's best to choose one that is most practical for the situation you're working in. The ILR Scale is described on the SurveyWiki. Other scales include the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (CEFR), DILF/DELF/DALF (French) and IELTS. Wikipedia has an excellent table comparing a number of proficiency scales. However, it is important that with whatever scale you choose, you use the assessment for spoken proficiency. Many of these scales give a final rating which includes all four language skills. For an SRT, it is spoken proficiency that you need to focus on. The IELTS Speaking test band descriptors are probably the most detailed available publicly.
  2. Select 50 speakers ⇒ it is important that you select the people to trial the preliminary version of the test through the social networks that you have access to where you are working. Subjects need to be able to trust you and if there is a connection through a mutual contact, this helps. From the start, you should communicate clearly and honestly about the purpose of wanting their participation. Ultimately, you are not evaluating their language proficiency, they are helping to create a test to use to collect data from others.
  3. Assess each participant ⇒ It is important that you are able to identify an appropriate number of people at each level of the proficiency scale you are using. How many people will depend on how many levels you have and how specific the descriptors are for each level. It can be particularly hard to get enough people scoring at the extreme ends of the scale. If you don't get people for the highest and lowest levels, you won't be able to use the SRT to make any statements either higher or lower than the levels that you have speakers for, and this will reduce the capability of your SRT to give you a full picture of the language ability of the communities you use it with.

Pilot Testing

Aim: start calibrating the test by testing the 50 or so speakers from the previous stage
  1. Decide who does the testing ⇒ It is easier, particularly for relationships, if whoever has carried out the proficiency evaluation described above, also carries out the pilot testing of the SRT. For consistency, one person should score all the tests at this stage, and this person should be familiar enough with all the 40-50 preliminary test sentences to follow the transcripts at the same speed as the recording.
  2. Get informed consent ⇒ Each person taking the preliminary test will need to understand the entire test process beforehand. This is the "informed" part of "informed consent." They have to know what they are consenting to. At this stage, they are consenting for their score data to be used only to help us construct a final test. It is recommended that you record the responses of a few people at each proficiency level and for about 10 random other participants. These recordings will be used to train the test administrators. The results and recordings of the preliminary test will not be used in our final reporting and not shared outside the research team. As you will have to explain this for each of the possibly 50 people you're working with, it's good to have this standardised to ensure consistency and that you don't leave anything out.
  3. Test high-level speakers first ⇒ It's often hardest to design an SRT that adequately discriminates among the higher levels of proficiency. So, if possible, it's best to start doing preliminary testing with the people you identified as having the highest proficiency in the language. As you carry out the test, allow them space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well. As their scores are calculated, plot them as described in step 5 below. If you can't find a difference between these high-level speakers, your test sentences don't have enough difficult examples in them. Get a night's sleep first, and then go back to elicit sample sentences which are more difficult running through all the steps above once more. If you still don't have a test which distinguishes them, it might be that you've chosen the wrong proficiency scale or assigned these speakers to the wrong levels. By this point, you'll have realised how important it was to be thorough at all the earlier stages of the test. Go away for a long weekend, and then start again.
  4. Test all the speakers ⇒ If step 3 above shows you are good to go, test all the 50 or so participants you have remembering to identify about 10 random participants to record and to record a few from each proficiency level. As before, allow each participant space after each test sentence to repeat by pausing the recording. Try not to comment on the subject's performance, but do give encouragement that they are participating well.
  5. Score the test responses ⇒ The first task here is to define what an error is. Scoring an SRT involves rating each sentence from 0-3 points as follows:
Score Description
3 points perfect, no errors in sentence
2 points one error in sentence
1 point two errors in sentence
0 point three or more errors in sentence
It is also helpful to know what types of errors have been made. (although I have to say I'm not sure why - can anyone clarify by adding a clause in here to that effect) Radloff (1991:54) suggests the following:
Score Description
o word omitted from sentence
s word substituted for another
> or < any change of word order (counts as one error)
~~ word garbled so as to lose meaning
+ word or phrase added to sentence
R word or phrase repeated (counts as one error)
W wrong word or word ending (grammatical error)
As each participant's results are scored, plot them on a graph against the proficiency levels that you previously assigned them.

Refining the Test

Aim: select the final 15 sentences which will form the SRT and match SRT scores to proficiency levels

Note: This stage of SRT development is probably the most challenging technically.

  1. Apply the discrimination index
  2. Calculate difficulty levels
  3. Select your final 15 ⇒;
  4. Calculate the final score
  5. Record the final test
  6. Calibrate the test
  7. Calculate the line of regression
  8. Calculate the standard error
  9. Calculate the coefficient of correlation
  10. Control test the final test

References

<references/>