Sunday, April 18, 2010

Machine and Human Translation 5: Self-help

In the report of the BBC's experiment in conferencing with MT (March 24 post), there’s an example given of the output from Google Translate:
Dowry Allowathb of Khartoum, Sudan, submitted a comment in Arabic on the topic "If you could say one thing to the world, what would it be?" which came out in English as:
That the budget of one war enough to satisfy the hungry Africa, not to mention the budget arm of one of the major powers.
“Not perfect, but intelligible,” is the reporter‘s verdict. Is it intelligible to you? Here‘s a simple test: rewrite it in standard English, or in standard whatever your language happens to be. What you already know about the state of the world will help as a referent for your COMAL.

As the reporter says, a proposition doesn’t have to be perfectly stated for it to be intelligible. I call this patching-up process reader/listener input tolerance, or reader self-help for short. The ability of readers —and listeners for that matter— to make sense of imperfectly formed input is not peculiar to translations. We do it all the time in our own languages. The sense we arrive at may sometimes be wrong, but we strive for it and we do so instinctively. Of course it’s better if the translation is well expressed and we don’t have to strive so hard, but faute de mieux…

Here’s another story about self-help.

Back around 1990, a branch of the Canadian government was using MT to translate into French the notices of job vacancies that were posted up each day in government employment offices. The MT system was rather primitive, and at one point it became notorious for a classic mistake. The French for man is homme. But Man. (with the initial capital and the dot) is also a standard abbreviation for Manitoba, one of the Canadian provinces. So when there was a job vacant in a town in that province, the location would come out, for example, as Winnipeg, Homme. One day I ran into a senior official from the ministry and asked him about this. He replied,
“The only people who complain about our translations are university professors like you. Our clients are happy with them because they get them the same day. If they had to wait even 24 hours while we sent them to the government translation bureau, the chances are that the vacancy would already be filled. And as for Manitoba / Homme, well they soon learn that Homme means Manitoba.”
That brings us up against two other phenomena of input tolerance. The first is that if a mistranslation is regular, readers who encounter it often will probably learn from its context what it really means. The second is that motivation plays a role. The more people need something, the more accommodating they prepared to be are in order to get it.

Yet another reason people may be tolerant of faulty MT is that they don’t really want a translation in the normal sense. All they want is to get an idea, the gist, of what a document or speech is about. For documents, this is called scanning. The earliest successful large-scale MT system was SYSTRAN, which became operational around 1970. Nowadays anyone can buy it and it’s on the internet, but at first its sole client was the Foreign Technology Division of the United States Air Force. The USAF used it for scanning Russian technical literature after the Sputnik awakening. The purpose wasn’t to produce usable translations but to help identify texts that might be worth closer attention. And then the translating was done by human experts.

To sum up, MT so far, and with rare exceptions, demands special reading skills that are still the preserve of humans.


Dave Lee. BBC debate demonstrates power of machine translation. BBC News, March 18, 2010.

Walter Daelemans and Véronique Hoste (eds.). Evaluation of Translation Technology. University Press Antwerp (Belgium), 2009. 262 p. 35 euro. Just out! This one is for specialists.


Peter Toma, the Hungarian-American founder of SYSTRAN. I met him briefly when he came to Ottawa to help try and sell SYSTRAN to the Canadian government. The photo comes from a very interesting website about early computers,