Language Assessment

SUMMARY OF ASSESSING LISTENING ( Page 112-139) AND ASSESSING SPEAKING ( 140- 184 )

SUMMARY ASSESSING LISTENING

Our focus will shift away from the standardized testing juggernaut to the level at which you will usually work: the day-to-day classroom assessment of listening, speaking, reading, and writing. Since this is the level at which you will most frequently have the opportunity to apply principles of assessment, the next four chapters of this book will provide guidelines and hands-on practice in testing within a curriculum of English as a second or foreign language.

First, two important caveats. The fact that the four language skills are discussed in four separate chapters should in no way predispose you to think that those skills are or should be assessed in isolation. Every TESOL professional (see TBP, Chapter 15) will tell you that the integration of skills is of paramount importance in language learning. Likewise, assessment is more authentic and provides more washback when skills are integrated. Nevertheless, the skills are treated independently here in order to identify principles, test types, tasks, and issues associated with each one.

Second, you may already have scanned through this book to look for a chapter on assessing grammar and vocabulary, or something in the way of a focus on form in assessment. The treatment of form-focused assessment is not relegated to a separate chapter here for a very distinct reason: there is no such thing as a test of grammar or vocabulary that does not invoke one or more of the separate skills of listening, speaking, reading, or writing! It's not uncommon to find little "grammar tests" and "vocabulary tests" in textbooks, and these may be perfectly useful instruments.

OBSERVING THE PERFORMANCE OF THE FOUR SKILLS

Before focusing on listening itself, think about the two interacting concepts of performance and observation. All language users perform the acts of listening, speaking, reading, and writing. When you propose to assess someone's ability in one or a combination of the four skills, you assess that person's competence, but you observe the person's performance.

So, one important principle for assessing a learner's competence is to consider the fallibility of the results of a single performance, such as that produced in a test. As with any attempt at measurement, it is your obligation as a teacher to triangulate your measurements: consider at least two (or more) performances and/or contexts before drawing a conclusion. That could take the form of one or more of the following designs:

• several tests that are combined to form an assessment

• a single test with multiple test tasks to account for learning styles and performance

• in-class and extra-class graded work

• alternative forms of assessment (e.g., journal, portfolio, conference, observation,

self-assessment, peer-assessment).

Multiple measures will always give you a more reliable and valid assessment than a single measure. A second principle is one that we teachers often forget. We must rely as much as possible on observable performance in our assessments of students. Observable means being able to see or hear the performance of the learner (the senses of touch, taste, and smell don't apply very often to language testing!). You observe only the result of the meaningful input in the

form of spoken or written output, just as you observe the result of the wind by noticing trees waving back and forth.

The productive skills of speaking and writing allow us to hear and see the process as it is performed. Writing gives a permanent product in the form of a written piece. But unless you have recorded speech, there is no permanent observable product for speaking performance because all those words you just heard have vanished from your perception and you have been transformed into meaningful intake somewhere in your brain.

Receptive skills, then, are clearly the more enigmatic of the two modes of performance. You cannot observe the actual act of listening or reading, nor can you see or hear an actual product! You can observe learners only while they are listening or reading. The upshot is that all assessment of listening and reading must be made on the basis of observing the test-taker's speaking or writing (or nonverbal response), and not on the listening or reading itself.

THE IMPORTANCE OF LISTENING

Listening has often played second fiddle to its counterpart speaking. But it is rare to find just a listening test. One reason for this emphasis is that listening is often implied as a component of speaking. How could you speak a language without also listening? In addition, the overtly observable nature of speaking renders it more empirically measurable then listening.

Every teacher of language knows that one's oral production ability-other than monologues, speeches, reading aloud, and the like-is only as good as one's listening comprehension ability. But of even further impact is the likelihood that input in the aural-oral mode accounts for a large proportion of successful language acquisition. In a typical day, we do measurably more listening than speaking (with the exception of one or two of your friends who may be nonstop chatterboxes!).

We therefore need·-to pay close attention to listening as a mode of performance for assessment in the classroom. In this chapter, we will begin with basic principles and types of listening, then move to a survey of tasks that can be used to assess listening.

BASIC TYPES OF LISTENING

As with all effective tests, designing appropriate assessment tasks in listening begins with the specification of objectives, or criteria. Those objectives may be classified in terms -of several types of listening performance. Think about what you do when you listen. Literally in nanoseconds, the following processes flash through your brain:

1. You recognize speech sounds and hold temporary "imprint" of them in short-term memory.

2. You simultaneously determine the type of speech event (monologue, interpersonal dialogue, transactional dialogue) that is being processed and attend to its context (who the speaker is, location, purpose) and the content of the message.

3. You use (bottom-up) linguistic decoding skills and/or (top-down) background schemata to bring a plausible interpretation to the message, and assign a literal and intended meaning to the utterance.

4. In most cases (except for repetition tasks, which involve short-term memory only), you delete the exact linguistic form in which the message was originally received in favor of conceptually retaining important or relevant information in long-term memory.

Each of these stages represents a potential assessment objective:

• Comprehending of surface structure elements such as phonemes, words, intonation, or a grammatical category

• Understanding of pragmatic context

• Determining meaning of auditory input

• Developing the gist, a global or comprehensive understanding

From these stages we can derive four commonly identified types of listening performance, each of which comprises a category within' which to consider assessment tasks and procedures.

1. Intensive. Listening for perception of the components (phonemes, words, intonation, discourse markers, etc.) of a larger stretch of language.

2. Responsive. Listening to a relatively short stretch of language (a greeting, question, command, comprehension check, etc.) in order to make an equally short response.

3. Selective. Processing stretches of discourse such as short monologues for several minutes in order to "scan" for certain information. The purpose of such performance is not necessarily to look for global or general meanings, but to be able to comprehend designated information in a context of longer stretches of spoken language (such as classroom directions from a teacher, TV or radio news items, or stories). Assessment tasks in selective listening could ask students, for example, to listen for names, numbers, a grammatical category, directions (in a map exercise), or certain facts and events.

4. Extensive. Listening to· develop a top-down, global understanding of spoken language. Extensive performance ranges from listening to lengthy lectures to listening to a conversation and deriving a comprehensive message or purpose. Listening for the gist, for the main idea, and making inferences are all part of extensive listening.

For full comprehension, test-takers may at the extensive level need to invoke interactive skills (perhaps note-taking, questioning, discussion): listening that includes all four of the above types as test-takers actively participate in discussion) debates, conversations, role plays, and pair and group work. Their listening performance must be intricately integrated with speaking (and perhaps other skills) in the authentic give-and-take of communicative interchange

MICRO- AND MACRO SKILLS OF LISTENING

A us full way of synthesizing the above two lists is to consider a finite number of micro- and macro skills implied in the performance of listening comprehension. Richards' (1983) list of micros kills has proven useful in the domain of specifying objectives for learning and may be even more useful in forcing test makers to carefully identify specific assessment objectives. The micro and macros kills provide 17 different objectives to assess in listening: Micro- and macro skills of listening (adapted from Richards, 1983)

Micro skills

• Discriminate among the distinctive sounds of English.

• Retain chunks of language of different lengths in short-term memory.

• Recognize English stress patterns, words in stressed and unstressed positions, rhythm. In structure, intonation contours, and their role in signaling information.

• Recognize reduced forms of words.

• Distinguish word boundaries, recognize a core of words, and interpret word order patterns and their significance.

• Process speech at different rates of delivery.

• Process speech containing pauses, errors, corrections, and other performance variable

• Recognize grammatical word classes (nouns, verbs, etc.

• Detect sentence constituents and distinguish between major and minor constituents.

• Recognize that a particular meaning may be expressed in different grammatical forms.

• Recognize cohesive devices in spoken discourse.

Macro skills

• Recognize the communicative functions of utterances, according to situations, participants, goals.

• Infer situations, participants, goals using real-world knowledge.

• From events, ideas, and so on, described, predict outcomes, infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, flew information, given information, generalization, and exemplification

• Distinguish between literal and implied meanings.

• Use facial, kinesic, body language, and other nonverbal clues to decipher meanings.

• Develop and use a battery of listening strategies, such as detecting keywords, guessing the 'meaning of words from context, appealing for help, and signaling comprehension or lack thereof.

Implied in the taxonomy above is a notion of what makes many aspects of listening difficult, or why listening is not simply a linear process of recording strings of language as they are transmitted into our brains. Consider the following list of what makes listening difficult (adapted from Richards 1983; Ur, 1984;Dunkel,1991):

• Clustering: attending to appropriate "chunks" of language-phrases, clauses, constituents

• Redundancy: recognizing the kinds of repetitions, rephrasing, elaborations, and insertions that unrehearsed spoken language often contains, and benefiting from that recognition

• Reduced forms: understanding the reduced forms that may not have been a part of an English learner's past learning experiences in classes where only formal "textbook" language has been presented

• Performance variables: being able to "weed out" hesitations, false starts, pauses, and corrections in natural speech.

• Colloquial language: comprehending idioms, slang, reduced forms, shared cultural knowledge

• Rate of delivery: keeping up with the speed of delivery, processing automatically as the speaker continues

• Stress, rhythm, and intonation: correctly understanding prosodic elements of spoken language, which is almost always much more difficult than understanding the smaller phonological bits and pieces

• Interaction: managing the interactive flow of language from listening to speaking to listening, etc.

DESIGNING ASSESSMENT TASKS: INTENSIVE LISTENING

Once you have determined objectives, your next step is to design the tasks, including making decisions about how you will elicit performance and how you will' expect the test-taker to respond. he focus in this section is on the micro skills of intensive listening.

Recognizing Phonological and Morphological Elements

A typical form of intensive listening at this level is the assessment of recognition of phonological and morphological elements of language. A classic test task gives a spoken stimulus and asks test-takers to identify the stimulus from two or more.

Paraphrase Recognition

The next step up on the scale of listening comprehension micros kills is words, phrases, and sentences, which are frequently assessed by providing a stimulus sentence and asking the test-taker to choose the correct paraphrase from a number of choices.

DESIGNING ASSESSMENT TASKS: RESPONSIVE LISTENING

A question-and-answer format can provide some interactivity in these lower-end listening tasks. The test-taker's response is the appropriate answer to a question. This item is recognition of the who question how much and its appropriate response.

DESIGNING ASSESSMENT TASKS: SELECTIVE LISTENING

A third type of listening performance is selective listening, in-which the test-taker listens to a limited quantity of aural input and must discern within it some specific information. A number of techniques have been used 'that require selective listening.

Listening Cloze

Listening cloze tasks (sometit11es called cloze dictations or partial dictations) require the test-taker to listen to a story. Monologue, or conversation and simultaneously read the written text in which selected words or phrases have been deleted. Cloze procedure is most commonly associated with reading only. In its generic form, the test consists of a passage in which every nth word (typically every seventh word) is deleted and the test-taker is asked to. supply an appropriate word.

One potential weakness of listening cloze techniques is that they may simply become reading comprehension tasks. Test-takers who are asked to listen to a story with periodic deletions in the written version may not need to listen at all, yet may still be able to respond with the appropriate word or phrase.

Other listening cloze tasks may focus on a grammatical category such as verb tenses, articles, two-word verbs, prepositions, or transition words/phrases. Notice two important structural differences between listening cloze tasks and standard reading cloze.

Listening cloze tasks should normally use an exact word method of scoring, in which you accept as a correct response only the, actual word or phrase that was spoken and consider other appropriate words as incorrect.

Information Transfer

Selective listening can also be assessed through an information transfer technique in which aurally processed information must be transferred to a visual representation, such as labeling a diagram, identifying an element in a picture, completing a form, or showing routes on a map. The objective of this task is to test prepositions and prepositional phrases of location (at the bottom, on top or around, along with larger, smaller), so other words and phrases such as back yard, yesterday, last few seeds, and scare away are supplied only as can’t and need not be tested.

Assuming that all the items, people, and actions are clearly depicted and understood by the test-taker, assessment may take the form of

• Questions Is the tall man near the door talking to a short woman?"

• True/false: "The woman wearing a red skirt is watching TV."

• Identification: "Point to the person who is standing behind the lamp." Draw a circle around the person to the left of the couch."

Sentence Repetition

The task of simply repeating a sentence or a partial sentence, or sentence repetition, is also used as an assessment of listening comprehension. As in a dictation (discussed below), the test-taker must retain a stretch of language long enough to reproduce it. and then' must respond with an oral repetition of that stimulus. Incorrect listening comprehension, whether at the phonemic or discourse level, may be manifested in the correctness of the repetition. A miscue in repetition is scored as a miscue in listening. In the case of somewhat longer sentences, one could argue that the ability to recognize and retain chunks of language as well as threads of meaning might be assessed through repetition.

Sentence repetition is far from a flawless listening assessment task. Buck (2001, p.79) noted that such tasks "are not just tests of listening, but tests of general oral skills." Further, this task may test only recognition of sounds, and it can easily be contaminated by lack of short-term memory ability, thus invalidating it as an assessment of comprehension alone.

DESIGNING ASSESSMENT TASKS: EXTENSIVE LISTENING

Drawing a clear distinction between any two of the categories of listening referred to here is problematic, but perhaps the fuzziest division is between selective and extensive listening. As we gradually move along the continuum from smaller to larger stretches of language, and from micro to macro skills of listening, the probability of using more extensive listening ask. Some important questions about designing assessments at this level emerge.

• Can listening performance be distinguished from cognitive processing factors such as memory, associations, storage, and recall?

• As assessment procedures become more communicative, does the task take into account test takers' ability to use grammatical expectancies, lexical collocations, semantic interpretations, and pragmatic competence?

• Are test tasks themselves correspondingly content valid and authentic-that is, do they mirror real-world language and context?

• As assessment tasks become more and more open-ended, they more closely resemble pedagogical tasks, which leads one to ask what the difference is between assessment and teaching tasks. The answer is scoring: the former imply specified scoring procedures, while the latter do not.

Dictation

Dictation is a widely researched genre of assessing listening comprehension. In a dictation, test-takers hear a passage, typically of 50 to 100 words, recited three times: first, at normal speed; then, with long pauses between phrases or natural word groups, during which time test-takers write down what they have just heard; and finally, at normal speed once more so they can check their work and proofread.

Dictations have been used as assessment tools for decades. Some readers still cringe at the thought of having to render a correctly spelled, verbatim version of a paragraph or story recited by the teacher. Until research on integrative testing was published (see Oller,1971), dictations were thought to be not much more than glorified spelling tests.

The difficulty of a dictation task can be easily manipulated by the length of the word groups (or bursts, as they are technically called), the length of the pauses, the speed at which the text is read, and the complexity of the discourse, grammar, and vocabulary used in the passage. Scoring is another matter. Depending on your context and purpose in administering a dictation, you will need to decide on scoring criteria for several possible kinds of errors:

• Spelling error only, ,but the word appears to have been heard correctly

• Spelling 'and/or obvious misrepresentation of a word, illegible word

• Grammatical error (For example, test-taker hears I can’t do it, writes I can do it.)

• Skipped word or phrase

• permutation of words

• Additional words not in the original

• Replacement of a word with an appropriate synonym

Dictation seems to provide a reasonably valid method for integrating listening and writing skills and for tapping into the cohesive elements of language implied in short passages.

Communicative Stimulus-Response Tasks

A stimulus monologue or conversation and then is asked to respond to a set of comprehension. The monologues, lectures and brief conversations used in such tasks are sometimes a little contrived, and certainly the subsequent multiple-choice questions don't mirror communicative, real-life situations. But with some care and creativity, one can create reasonably authentic stimuli, and in some rare cases the response mode (as shown in one example below) actually approaches complete authenticity.

Authentic Listening Tasks

Ideally, the language assessment field would have a stockpile of listening test types that are cognitively demanding. Communicative, and authentic, not to mention interactive by means of an integration with speaking. However, the nature of a test as a sa1nple of performance and a set of tasks with limited time frames implies an equally limited capacity to mirror all the real-world contexts of listening performance.

"There is no such thing as stated Buck (200 1, p. 92)."Every test requires some communicative language ability, and no test covers them all. Similarly, with the notion of authenticity, every task shares some characteristics with target-language tasks, and no test is completely authentic." Here are some possibilities.

• Note-taking. In the academic world, classroom lectures by professors are common features of a non-native English-user's experience. One form of a midterm examination at the American Language Institute at San Francisco State University (Kahn, 2002) uses a IS-minute lecture as a stimulus.

• Editing. Another authentic task provides both a written and a spoken stimulus, and requires the test-taker to listen for discrepancies. Scoring achieves relatively high reliability as there are usually a small number of specific differences that must be identified. Here is the way the task proceeds.

• Interpretive tasks. One of the intensive listening tasks described above was paraphrasing a story or conversation. An interpretive task extends the stimulus material to a longer stretch of discourse and forces the test-taker to infer a response. Potential stimulus include

• Retelling. In a related task, test-takers listen to a story or news event and simply retell it, or summarize it, either orally (on an audiotape) or in writing. In so doing, test-takers must identify the gist, main idea, purpose, supporting points, and/or conclusion to show full comprehension. Scoring is partially predetermined by specifying a minimu111 number of elements that must appear in the retelling.

ASSESSING SPEAKING

BASIC TYPES OF SPEAKING

In Chapter 6, we cited four categories of listening performance assessment tasks. A similar taxonomy emerges for oral production.

1. Imitative. At one end of a continuum of types of speaking performance is the ability to simply parrot back (imitate) a word or phrase or possibly a sentence. While this is a purely phonetic level of oral production, a number of prosodic, lexical, and grammatical properties of language may be included in the criterion performance.

2. Intensive. A second type of speaking frequently employed in assessment contexts is the production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships (such as prosodic elements-intonation, stress, rhythm, juncture). The speaker must be aware of semantic properties in order to be able to respond, but interaction with an interlocutor or test administrator is minimal at best.

3. Responsive. Responsive assessment tasks include interaction and test comprehension but at the somewhat limited level of very short conversations, standard greetings and small talk, simple requests and comments, and the like. The stimulus is almost always a spoken prompt (in order to preserve authenticity), with perhaps only one or two follow-up questions or retorts.

4. Interactive. The difference between responsive and interactive" speaking is in the length and complexity of the interaction, which sometimes includes multiple exchanges and/or multiple participants. Interaction can take the two forms of transactional language, which has the purpose of exchanging specific information, or interpersonal exchanges, which have the purpose of maintaining social relationships. 5. Extensive (monologue). Extensive oral production tasks include speeches, oral presentations, and story-telling, during which the opportunity for oral interaction from listeners is either highly limited (perhaps to nonverbal responses) or ruled out altogether.

MICRO-AND MACROSKUJS OF SPEAKING

The micro-and macros kills total roughly 16 different objectives to assess in speaking.

Microskills

1. Produce differences among English phonemes and allophonic variants. 2. Produce chunks of language of different lengths. 3. Produce English stress patterns, words in stressed and unstressed positions, rhythmic structure, and intonation contours. 4. Produce reduced forms of words and phrases. 5. Use an adequate number of lexical units (words) to accomplish pragmatic purposes. 6. Produce fluent speech at different rates of delivery.

7. Monitor one's own oral production and use various strategic devices pauses, fillers, self-corrections, back tracking-to enhance the clarity of the message. 8. Use grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), word order, patterns, rules, and elliptical forms. 9. Produce speech in natural constituents: in appropriate phrases, pause groups, breath groups, and sentence constituents.

10. Express a particular meaning in different grammatical forms. 11. Use cohesive devices in spoken discourse.

Macroskills 12. Appropriately accomplish communicative functions according to situations, participants, and goals. 13. Use appropriate styles, registers, implicature, redundancies, pragmatic conventions, conversation rules, floor-keeping and -yielding, interrupting, and other sociolinguistic features in face-to-face conversations. 14. Convey links and connections between events and communicate such relations as focal and peripheral ideas, events and feelings, new information and given information, generalization and exemplification. 15. Convey facial features, kinesics, body language, and other nonverbal cues along with verbal language. 16. Develop and use a battery of speaking strategies, such as emphasizing key words, rephrasing, providing a context for interpreting the meaning of words, appealing for help, and accurately assessing how well your interlocutor is understanding you.

There is such an array of oral production tasks that a complete treatment is almost impossible within the confines of one chapter in this book. Below is a consideration of the most common techniques with brief allusions to related tasks. As already noted in the introduction to this chapter, consider three important issues as you set out to design tasks: 1. No speaking task is capable of isolating the single skill of 'oral production. Concurrent involvement of the additional performance of aural comprehension, and possibly reading, is usually necessary. 2. Eliciting the specific criterion you have designated for a task can be tricky because beyond the word level, spoken language offers a number of productive options to test-takers. Make sure your elicitation prompt achieves its aims as closely as possible.

3. Because of the above two characteristics of oral production assessment, it is important to carefully specify scoring procedures for a response so that ultimately you achieve as high a reliability index as possible.

DESIGNING ASSESSMENT TASKS: IMITATIVE SPEAKING

You may be surprised to see the inclusion of simple phonological imitation in a consideration of assessment of oral production. After all, endless repeating of words, phrases, and sentences was the province of the long-since-discarded Audio lingual Method, and in an era of communicative language teaching, many believe that non meaningful imitation of sounds is fruitless. Such opinions-have faded in recent years as we discovered that an overemphasis on fluency can sometimes lead to the decline of accuracy in speech. And so we have been paying more attention to pronunciation, especially' supra segmentals, in an attempt to help learners be more comprehensible. . An occasional phonologically focused repetition task is warranted as long as repetition tasks are not allowed to occupy a dominant role in an overall oral production assessment, and as long as you artfully avoid a negative washback effect. Such tasks range from word level to sentence level, usually with each item focusing on. a specific phonological criterion. In a simple repetition task, test-takers repeat the stimulus, whether it is a pair of words, a sentence, or perhaps a question (to test for intonation production).

PHONEPASS TEST

An example of a popular test that uses imitative (as well as intensive) production tasks is Phone Pass, a widely used, commercially available speaking test in many countries. Among a number of speaking tasks on the test, repetition of sentences (of 8 to 12 words) occupies a prominent role. It is remarkable that research on the PhonePass test has supported the construct validity of its repetition tasks not just for a test taker's phonological ability but also for discourse and overall oral production ability (Townshend et al., 1998; Bernstein et aI., 2000; Cascallar & Bernstein, 2000).

DESIGNING ASSESSMENT TASKS: INTENSIVE SPEAKING

At the intensive level, test-takers are prompted to produce short stretches of discourse (no more than a sentence) through which they demonstrate linguistic ability at a specified level of language. Many tasks are "cued" tasks in that they lead the test taker into a narrow band of possibilities. Parts C and D of the Phone Pass test fulfill the criteria of intensive tasks as they elicit certain expected forms of language. Antonyms like high and low, happy and sad are prompted so that the, automated scoring mechanism anticipates only one word. The either/or task of Part D fulfills the same criterion. Intensive tasks may also be described as limited response tasks (Madsen, 1983), or mechanical tasks (Underhill, 1987), or what classroom pedagogy would label as controlled responses.

Directed Response Tasks

In this type of task, the test administrator elicits a particular grammatical form or a transformation of a sentence. Such tasks are clearly mechanical and not communicative, but they do require minimal processing of meaning in order to produce the correct grammatical output.

Read-Aloud Tasks

Intensive reading-aloud tasks include reading beyond the sentence level up to a paragraph or two. This technique is easily administered by selecting a passage that incorporates test specs and by recording the test-taker's output; the scoring is relatively easy because all of the test taker's oral production is controlled. Because of the results of research on the Phone Pass test, reading aloud may actually be a surprisingly strong indicator of overall oral production ability.

Sentence/Dialogue Completion Tasks and Oral Questionnaires

Another technique for targeting intensive aspects of language requires test-takers to read dialogue in which one speaker's lines have been omitted. Test-takers are first given time to read through the dialogue to get its gist and to think about appropriate lines to fill in. Then as the tape, teacher, or test administrator produces one part orally, the test-taker responds.

Picture-Cued Tasks

One of the more popular ways to elicit oral language performance at both intensive and extensive levels is a pictl1re-cued stimulus that requires a description from the test taker. Pictures may be very simple, designed to elicit a word or a phrase; somewhat more elaborate and "busy"; or composed of a series that tells a story or incident. Here is an example of a picture-cued elicitation of the production of a simple minimal pair.

Opinions about paintings, persuasive monologue and directions on a map create a more complicated problem for scoring. More demand is placed on the test administrator to make calculated judgments, in which case a modified form of a scale such as the one suggested for evaluating interviews (below) could be used: • grammar • vocabulary • comprehension • fluency • pronunciation • task (accomplishing the objective of the elicited task)

Translation (of Limited Stretches of Discourse)

Translation is a part of our tradition in language teaching that we tend to discount or disdain, if only because our current pedagogical stance plays down its importance. Translation methods of teaching are certainly passe in an era of direct approaches to creating communicative classrooms. But we should remember that in countries where English is not the native or prevailing language, translation is a meaningful communicative device in contexts where the English user is. called on to be an interpreter. Also, translation is a well-proven communication strategy for learners of a second language.

DESIGNING ASSESSMENT TASKS: RESPONSIVE SPEAKING

Assessment of responsive tasks involves brief interactions with an interlocutor, differing from intensive tasks in the increased creativity given to the test-taker and from interactive tasks by the somewhat limited length of utterances.

Question and Answer

Question-and-answer tasks can consist of one or two questions from an interviewer, or they can make up a portion of a whole battery of questions and prompts in an oral interview. They can vary from simple questions like "What is this called in English?" to complex questions like "What are the steps governments should take, if any, to stem the rate of deforestation in tropical countries?" The first question is intensive in its purpose; it is a display question intended to elicit a predetermined correct response. We have already looked at some of these types or questions in the previous section. Questions at the responsive level tend to be genuine referential questions in which the test-taker is given more opportunity to produce meaningful language in response.

Giving Instructions and Directions

We are all called on in our daily routines to read instructions on how to operate an appliance, how to put a bookshelf together, or how to create a delicious clam chowder. Somewhat less frequent is the mandate to provide such instructions orally, but this speech act is still relatively common. Using such a stimulus in an assessment context provides an opportunity for the test-taker to engage in a relatively extended stretch of discourse, to be very clear and specific, and to use appropriate discourse markers and connectors. The technique is Simple: the administrator poses the problem, and the test-taker responds. Scoring is based primarily on comprehensibility and scondari1y on other specified grammatical or discourse categories. Here are some possibilities.

Paraphrasing

Another type of assessment task that can be categorized as responsive asks the test taker to read or hear a limited number of sentences (perhaps two to five) and-produce a paraphrase of the sentence. The advantages of such tasks are that they elicit short stretches of output and perhaps tap into test-takers' ability to practice the conversational art of conciseness by reducing the output/input ratio.

TEST OF SPOKEN ENGLISH (TSE)

Somewhere straddling responsive, interactive, and extensive speaking tasks lies another popular commercial oral production assessment, the Test of Spoken English (TSE)'. The TSE is a 20-minute audio taped test of oral language ability within an academic or professional environment. TSE scores are used by many North American institutions of higher education to select international teaching assistants. The scores are also used for selecting and certifying health professionals such as physicians, nurses, pharmacists, physical therapists, and veterinarians.

The following content specifications for the TSE represent the discourse and pragmatic contexts assessed in each administration:

1. Describe something physical.

2. Narrate from presented material.

3. Summarize information of the speaker's own choice.

4. Give directions based on visual materials.

5. Give instructions.

6. Give an opinion.

7. Support an. opinion.

8. Compare/contrast.

9. Hypothesize.

10. Function "interactively."

11. Define.

DESIGNING ASSESSMENT TASKS: INTERACTIVE SPEAKING

The final two categories of oral production assessment (interactive and extensive speaking) include tasks that involve relatively long stretches of interactive discourse (interviews, role plays, discussions, games) and tasks. of equally long duration but that involve less interaction (speeches, telling longer stories, and extended explanations and translations).The obvious difference between the two sets of tasks is the degree of interaction with' an interlocutor. Also, interactive tasks are what some would describe as interpersonal, while the final category includes more transactional speech events.

Interview

When "oral production assessment" is mentioned, the first thing that comes to mind is an oral interview: a test administrator and a test-taker sit downjn a direct face-to face exchange and proceed through a protocol of questions and directives. The interview, which may be tape-recorded for re-listening, is then scored on one or more parameters such as accuracy in pronunciation and/or grammar, vocabulary usage, fluency, sociolinguistic/pragmatic appropriateness, task accomplishment, and even comprehension.

Every effective interview contains a number of mandatory stages. Two decades ago, Michael Canale (1984) proposed a framework for oral proficiency testing that has with stood the test of time. He suggested that test-takers will perform at their best if they are led through four stages:

1. Warm-up

2. Level check.

3. Probe.

4. Wind-down.

The success of an oral interview will depend on

• clearly specifying administrative procedures of the assessment (practicality),

• focusing the questions and probes on the purpose of the assessment (validity),

• appropriately eliciting an optimal amount and quality of oral production from the test taker (biased for best performance), and

• creating a consistent, workable scoring system (reliability).

Role Play

Role playing is a popular pedagogical activity in communicative language-teaching classes. Within const set forth by the guidelines, it frees students to be somewhat creative in their linguistic output. In some versions, role play allows some rehearsal time so that students can map out what they are going to say. And it has the effect of lowering anxieties as students can, even for a few moments, take on the persona of someone other than themselves.

Discussions and Conversations

As formal assessment devices, discussions and conversations with and among students are difficult to specify and even more difficult to score. But as informal techniques to assess learners, they offer a level of authenticity and spontaneity that other assessment techniques may not provide. Discussions may be especially appropriate tasks through which to elicit and observe such abilities as

• topic nomination, maintenance, and termination;

• attention getting, interrupting, floor holding, control;

• clarifying, questioning, paraphrasing; comprehension Signals (nodding, "uh-huh,""hmm," etc.);

• negotiating meaning;

• intonation patterns for pragmatic effect;

• kinesics, . eye contact, proxemics, body language; and

• politeness, formality, and other sociolinguistic factors.

Games

Among informal assessment devices are a variety of games that directly involve language production. Consider the following types: Assessment-games

1. "Tinkertoy" game: A Tinkertoy (or Lego block) structure is build It behind a screen. One or two learners are allowed to view the structure. In successive stages of construction, the learners tell "runners" (who can't observe the structure) how to re-create the structure. The runners then tell "builders" behind another screen how to build the structure.

2. Crossword puzzles are created in which the names of all members of a class are clued by obscure information about them. Each class member -must ask questions of others to determine who matches the clues in the puzzle.

3. Information gap grids are created such that class members must conduct mini-interviews of other classmates to fill in boxes, e.g., "born in July," "plays the violin," "has a two-year-old child," etc. 4. City maps are distributed to class members. Predetermined map directions are given-to one student who, with a city map in front of him or her, describes the route to a partner, who must then trace the route and get to the correct final destination.

ORAL PROFICIENCY INTERVIEW (OPI)

The best known oral interview format is one that has gone through a considerable metamorphosis over the last half-century, the Oral Proficiency Interview (OPI). Originally known as the Foreign Service Institute (FSI) test, the OPI is the result of a historical progression of revisions under the auspices of several agencies, including the Educational Testing Service and the American Council on Teaching Foreign Languages (ACTFL). The latter, a-professional society for research on foreign language instruction and assessment, has now become the principal body for promoting the use of the OPI."The OP! is widely used across dozens of languages around the world. Only certified examiners are authorized to administer the OP!; certification workshops are available, at costs of around $700 for ACTFL members, through ACTFL at selected sites and conferences throughout the year.

Bachman (1988, p. 149) a Iso pointed out that the validity of the OPI simply cannot be demonstrated “because it confounds abilities with elicitation procedures in its design, and it provides only a single rating, which has no basis in either theory or research.” Meanwhile, a great deal of experimentation continues to be conducted to design better oral proficiency testing methods (Bailey, 1998;Young & He, 1998), With ongoing critical attention to issues of language assessment in the years to come, we may be able to solve some of the thorny problems of how best to elicit oral production in authentic contexts and to create valid and reliable scoring methods.

DESIGNING ASSESSMENTS: EXTENSIVE SPEAKING

Extensive speaking tasks involve complex, relatively lengthy stretches of discourse. They are frequently variations on monologues, usually with minimal verbal interaction.

Oral Presentations

In the academic and professional arenas, it would not be uncommon to be called on to present are port- a-paper; a marketing plan, a-sales-idea, a design of a new product, or a method. A summary of oral assessment techniques would therefore be incomplete without some consideration of extensive speaking tasks. Once again the rules! i' for effective assessment must be invoked: (a) specify the criterion, (b) set appropriate tasks, (c) elicit optimal output, and (d) establish practical, reliable scoring pro-/ / cedures. And once again scoring is the key assessment challenge. "

For oral presentations, a checklist or grid is a common means of scoring or evaluation. Holistic scores are tempting to use for their apparent practicality, but they may obscure the variability of performance across several subcategories, especially the two major components of content and delivery. Following is an example of a checklist for a prepared oral presentation at the intermediate or advanced level of English

Picture-Cued Story-Telling

One of the most common techniques for eliciting oral production is through visual pictures, photographs, diagrams, and charts. We have already looked at this' elicitation device for intensive tasks, but at this level we consider a picture or a series of pictures as a stimulus for a longer story or description.

Retelling a Story, News Event

In this type of task, test-takers hear or read a story or news event that they are asked to retell. This differs from the paraphrasing task discussed above (pages 161-162) in that it is a longer stretch of discourse and a different genre. The objectives in assigning such a task vary from listening comprehension of the original to production of a number of oral discourse features (communicating sequences and relationships 01 events, stress and emphasis patterns, "expression" in the case of a dramatic story), fluency, and interaction with the hearer. Scoring should of course meet the intended criteria.

Translation (of Extended Prose)

Translation of words, phrases, or short sentences was mentioned under the category of-intensive speaking. Here, longer texts are presented for the test-taker to read in the native language and then translate into English. Those texts could come in many forms: dialogue, directions for assembly of a product, a synopsis of a story or play or movie, directions on how to find something on a map, and other genres. The advantage of translation is in the control of the content, vocabulary, and, to some extent, the grammatical and discourse features.

References:

Brown, H. Douglas. 2004. Language Assessment: Principle and Classroom Practices. New York: Pearson Education

Language Assessment

Jumat, 15 Mei 2020

Tidak ada komentar:

Posting Komentar

Laporkan Penyalahgunaan