Tuesday, April 6, 2010

Machine and Human Translation 3: Pragmatics

I ended the previous post in this series (April 2) by saying that in the same way as MT systems ‘borrow’ COMAL, they can recycle another human translator competence.

Let me illustrate by an example, a very old one. Back in 1970, the MT project where I was working in Montreal received a visit from a distinguished French computer scientist, the late Bernard Vauquois of the University of Grenoble (see photo). Incidentally, he was co-founder of the International Committee on Computational Linguistics, which I mentioned in the preceding post. One of our team, Michel van Canaghem, went to the airport to drive him into town. We wanted to impress him, and to that end we set up a little demonstration of the system we were in the process of developing. It was called Q-Systems - Q for Quebec - and it had just been invented for us by another brilliant French computer scientist, Alain Colmerauer. We got together with Prof. Vauquois in our offices, where we had recently received our first terminal, a clunky Telex machine. From there we submitted an English sentence for translation into French on the university’s mainframe computer three storeys below. It could only be a single sentence, because we only had 128K of CPU (equivalent to RAM) at our disposal on the mainframe. Remember it was 1970! The sentence was this:
Prof Vauquois was met at the airport by Michel.
And the translation came back:
Michel a rencontré M Vauquois à l’aéroport.
We were proud. On the face of it a simple sentence, but the system had transformed the English passive construction to a French active one, which was better French, and it had changed the order of the constituents to put à l’aéroport at the end. We waited for Prof. Vauquois to congratulate us.

Not so. Instead he exclaimed, “Ah, mais ça ne va pas!” (Oh, but there’s something wrong!”). And he continued, “Michel didn’t meet me by chance. He was there waiting for me. The verb should be accueilli (received me, welcomed me), not rencontré.

Disappointed, we nevertheless saw what he meant. We might argue over the use of rencontré, but it’s undoubtedly ambiguous as to intent (just as met is), whereas acceuilli isn’t. But worse than that, we knew our system had no way to resolve such an ambiguity. It had no pragmatics. Pragmatics?
Pragmatics is a subfield of linguistics which studies the ways in which context contributes to meaning.
The ability to understand language according to context is something humans have and MT systems don’t, at least not unless they borrow it from human translations. The experience was a revelation to me; that’s why I remember it so clearly.

There’s a sequel to this story. Forty years later, we might expect that MT and the computers supporting it had advanced enough to make child’s play of translating such a simple sentence. So let’s find out. First I submitted it to Microsoft’s Bing Translator. Here’s the output:
Prof Vauquois a été rencontré à l'aéroport par Michel.
’Nuf said. We did as well or better in 1970. So let’s move on to the leader at the moment, Google Translate:
Prof Vauquois a été accueilli à l'aéroport par Michel.
Bravo! There’s accueilli. My guess is that Google Translate found it in human translations of other texts where meet is accompanied by at the airport or something very similar.

So far so good. And yet there’s still a disappointment. It concerns Prof Vauquois in the output. It’s grammatically incorrect. If we want to use this title, we have to write Le professeur Vauquois with the definite article. In 1970, we dealt with it in a different way, so we can’t compare on this point; but it’s very disappointing that a system which does well on simulated pragmatics falls down on a fairly elementary point of grammar. It’s all very well to find fragments of translation in the translation memory, but we would prefer them to be strung together into correct sentences and paragraphs, and it seems that’s still a problem for this method.

More to come.

Christian Boitet et al. Bernard Vauquois, pioneer of machine
translation, Computational Linguistics, 12:1.43-47, 1986.

Alain Colmerauer. Les Systèmes-Q ou un formalisme pour analyser et synthétiser les phrases sur ordinateur (Q-Systems, a formalism for parsing and generating sentences by computer). Publication interne Nº 43. Université de Montréal, Département d'informatique, September 1970.


Bing Translator. http://www.microsofttranslator.com/Default.aspx.

Google Translate. http://translate.google.com.

Photo: Harcourt, Paris

No comments:

Post a Comment