From time to time we delve down into the archive of this blog in order to revive a post that deserves not to be forgotten amidst the mass of hundreds. Here is one such post.
FRIDAY, AUGUST 3, 2012
Marking Positively: How to Score
Natural Translations
At the Forli conference in May (enter forli in the Search
box), I noticed that some people are still using the old subtractive scoring
method to rate NT.
What is the subtractive method? It means starting from 100 points and knocking
off a point, or several points, for each mistake of any kind; typically a point
or two for minor errors of content or expression and up to five points for
major ones. The 'pass mark' is usually expressed as a positive percentage, but
it's really a 'failure score'. That's how students' written translations are
marked, and likewise the examinations of the professional associations like the
Canadian one to which I belong. It can also be used for interpretations,
especially if they're transcribed.
Two objections can be raised. The first is a didactic one: that the approach is
negative and therefore discouraging. True, mathematically speaking, -30% of
mistakes is equivalent to +70% correct, but the psychological effect is
different. Anyway, it's not so important as the second objection, which is that
the approach reinforces 'nit-picking' by the markers, because small details are
allowed to affect the score significantly. I still squirm at a sequence in an
old film about an interpretation exercise for European Commission interpreters
(see References) in which a student is berated in front of the other students
for his translation of a single word.
When evaluating NT, we need to take the opposite approach. Although mistakes
are of great interest insofar as they reveal the limitations and the
'pathology' of NT, in NT research our primary interest should be in
what subjects can translate and not in what they can't. A
score of only 40% because of numerous distortions and omissions would probably
entail failure for an Expert or Professional translator or a translation school
student; but for a Natural Translator it represents a non-negligible
translating ability and we should focus on it and analyse what that 40%
consists of.
How can we build a positive scoring method?
In the 1990s I became involved in the design of tests for candidates who wanted
to work as community interpreters for public services in Ontario, Canada. These
became known as the CILISAT tests and are still in use. The Government of
Ontario funded the necessary research. The candidates were almost always Native
Interpreters, because the pay was too low to attract Professional Experts and
because the languages were not taught in Canada. We decided we needed a test
instrument that would be better suited to Native, i.e. untrained, Interpreters
than those used by the translation schools and in the profession. So we turned
to a method called propositional analysis. It's used by
psychologists among others, and in fact I'd been introduced to it by the late
David Gerver, who was one of the pioneer researchers on interpreters and was
also a clinical psychologist. The form of it we used it can be described this way:
"To analyze the text, propositional analysis – a description of the text in terms of its semantic content – is used. The units of analysis are propositions, or units of meaning containing one verbal element plus one or more nouns. The corresponding units are then selected on the basis of meaning rather than structure."
In practice this meant that we broke
down the scripts for the interpretation tests into simple, single-clause
sentences representing propositions and then awarded points according to whether
the meaning of each proposition as a whole was conveyed in translation: zero
points for an omission or a meaning contrary to that of the proposition; 1
point for a meaning conveyed but not clearly or not completely; 2 points for a
complete and true rendering. There was a weighting that distinguished between
important and unimportant propositions. This scale was solely for meaning.
Other factors, for example correct language, were scored separately and
globally, not proposition by proposition.
For example, the statement, "At around 6 o'clock I saw a blue sports car
waiting on the other side of the road," might be broken down into:
The time was approximately 6 pm
I saw a car.
The car was blue.
The car was a sports car.
The car was waiting.
The car was on the other side of the road.
A paraphrase like, "I seed a
sport car stopping at the kerb of our street before supper" would score 7
points for informational meaning before being weighted for importance. (Work it
out! 1+2+0+2+1+1.) The maximum possible points varied with each script.
Small language mistakes like "seed" were relegated to a separate
evaluation.
References
Guadalupe Barrera Valdes and Manuel Rosalinda Cardenas. Constructing matching
tests in two languages: the application of propositional analysis. NABE:
The Journal for the National Association for Bilingual Education, vol. 9
no. 1, pp. 3-19. 1984. There’s an abstract here.
Roda P. Roberts. Interpreter assessment tools for different settings. In R. P.
Roberts et al. (eds.), The Critical Link 2: Interpreters in the
Community, Amsterdam, Benjamins, 1999. Most of it is here.
David Gerver. A psychological approach to simultaneous interpretation'. Meta,
vol. 20, no. 2, pp. 119-128, 1975. "A slightly altered version of a paper
presented at the 18th International Congress of Applied Psychology in Montreal
in July 1974". The text is here.
André Delvaux (director). Les Interprètes. Brussels: Commission of
the European Communities. c1975. 16 mm film. c15 mins.
Comments
The
post drew comments. Here are a couple of them.
To those of
you who have commented on the post about positive marking...
I ought to have acknowledged that even before I
heard about propositional analysis from David Gerver, I'd learnt about positive marking
from Daniel Gouadec, a well-known French translation teacher who came to teach
for a couple of years at the University of Ottawa in the late seventies (see
References). He was working at the time on a marking system for the Canadian
government Translation Bureau's quality assessment section, but I don't know
whether they ever used it.
In reply to SEO Translator: the deductive method is usually applied to short texts, say 300-500 words.
For purposes of comparison, texts of about the same length as one another are
used; and also, obviously, of the same level of difficulty. The 'pass mark'
varies according to the expectations of the markers or examiners, taking
account of the purpose of the exercise (professional examination, translation
school assignment, etc.), the institution, the difficulty of the text, the
level of the examinees, and so on. I've seen pass marks of 60% to 90%.
Logically, tests for Expert Translators should have a high pass mark.
In the CILISAT tests, using positive scoring, we actually had two pass marks: one for 'ready to work' and
a lower one for 'shows promise but needs training'. As I recall, they were 80
and 60 respectively, but that was after combining with the separate assessment
for quality of target language. I haven't thought about automating these or
other scorings. Possibly.
I want to express my appreciation for the way you described how to properly score a natural language translation. You made some excellent points, and both scholars and translation teachers can benefit from the strategies you outlined.
ReplyDelete