Friday, April 2, 2010

Machine and Human Translation 2: Recycling COMAL

In my post about COMAL last week, I said that MT lacked the monitoring made possible by the human competence comparison of meaning across languages. I ought to have qualified it by saying that the sophisticated MT systems available today do possess a sort of recycled COMAL. They inherit it because they borrow segments (words, phrases, sentences) from previously existing translations. The latter are stored alongside their originals in large data banks called translation memories, and the systems recycle them by using statistical methods to match segments of new text with segments of old ones. But the translations in the translation memories have all been done by humans; so that in importing a translation from a memory, the MT is benefitting second hand from the COMAL of a human translator. This approach has proven to be a real breakthrough in MT, nevertheless it still has drawbacks. One of them is that the matching algorithms work on the outward form of the segments, not their meaning, and the same form may have more than one translation in different real-world contexts. Here’s an example from Google Translate:
Spanish input: En estos momentos, el presidente español ocupa la presidencia de la Union Europea.
English output: At present, the Spanish president holds the
presidency of the European Union.
Apparently very good. Grammatically perfect. And Google will even speak the translation for you! Except that my human COMAL instantly alerts me that something is amiss here: hey, Spanish president doesn't mean the same as presidente español. Spain doesn’t have a president! Its head of state is the king. Presidente refers to what in English is called the prime minister. And that brings us to the crux of the matter. Human COMAL works because we relate both the source text and the translated text to independent referents, in this case a political institution. The referents of both must coincide.

The team at Google led by principal linguist Franz Och (see photo) are well aware of the frustration their clientele sometimes feels. So they offer a way to vent the frustration constructively. They have added a link to Google Translate that invites users to “Contribute a better translation” for modifying the translation memory. Note that no qualifications are required in order to contribute. Expert, Native and even Natural Translators are all welcome. In this way the system becomes marginally interactive and more collective. It is already collective insofar as the contents of the translation memory represent the combined production of many human translators. No wonder this kind of MT system is called hybrid.

In the same way as they ‘borrow’ COMAL, MT systems can import another human translator competence. See the next post in this series.

Peter F. Brown and colleagues at the IBM Thomas J Watson Research Center. A statistical approach to machine translation. Paper to the 12th International Conference on Computational Linguistics (COLING 88), Budapest, 1988. A later version was published in Computational Linguistics, 16:2.79-85, 1990. This was the seminal paper on mining translation memories for MT.

Miguel Helft. Google's computing power refines translation tool. The New York Times, New York edn., 9 March 2010, p. A1 =

Google Translate.

There’s an article on Translation Memory in Wikepedia.

Photo: Peter DaSilva for The New York Times.

No comments:

Post a Comment