This post is intended as a tribute to one of my students, Bruce McHaffie, whose pioneering thesis has been relegated to the oblivion that is the fate of so many MA theses. Bruce was my last student before I left Canada for Spain in the late 1990s. At that time he was working at the ill-fated Canadian telecommunications giant Nortel. As I was not competent to advise on or assess the computing aspect of his work, I turned to Mario Marchand, a professor in the computer science department, to be co-advisor.
It was only an MA thesis, but it could have been the basis for a doctorate. It is best summarised in Bruce’s own abstract:
The characteristics of automated learning and generalization, and of graceful degradation in the face of unforeseen input, give neural networks interesting potential for machine translation (MT). However, the field of connectionist MT has been little explored by researchers. This thesis provides an introduction to neural network concepts and summarizes and reviews the research in the field of connectionist MT. It describes the building, training and testing of TransNet, an embryonic neural network that translates weather reports from English into French. TransNet uses an innovative bigram (word pair) data representation which makes it possible to take some account of word order in the processing.
This a case where I must unashamedly say that I learnt more from my student than my student learnt from me; for the first chapters of the thesis provide an excellent, clear introduction to connectionist (i.e. neural network) computing, and if you want to learn about it, even today, you would do well to start there. He dealt with the training of networks as well as their architecture.
He did not claim to be the first to think of using AI for MT. In the second part of the thesis he reviewed the tentative attempts at it within the application of AI to natural language processing more generally. His conclusion was this:
Overall, given the relative youth of neurocomputing, a surprising variety of approaches has been used to attack natural language translation. Nonetheless, on first sight at least two significant areas of investigation have been overlooked. First, no language-to-language network has been trained on a real-world corpus of any size; and second, none have been developed specifically to operate on domain-specific texts (thereby restricting vocabulary size naturally). The network developed as part of this thesis attempts to address both oversights.
Now we get to chapter 6, which describes TransNet, “a neural network designed to translate natural language weather reports from English into French.” The first caveat to add is that it is not intended to be practical, marketable software. It is no more than a ‘proof of concept’, that is to say a pilot project which is executed to demonstrate that an idea is feasible. In this case the output was actually pointless, since the input was Canadian official weather report texts that were already routinely translated by an existing non-connectionist MT system. However, the duplication had an advantage since it provided a standard by which to judge TransNet’s output. To be considered successful, TransNet’s output had to be at least as acceptable as that of the existing system. Notice that this is an MT to MT comparison, not MT to human translation.
The main objectives were to train a network using a relatively large natural language corpus (weather reports) with an interesting vocabulary (meteorology) and to account for word order in the data representation for input.
Bruce’s accomplishment has to be considered in the context of the very limited resources that were available to him. He had no funds to build (or employ others to build) his own dedicated networker. He had to work with free off-the-shelf software called Xerion from the University of Toronto that dated back to 1992. At times it was not up to the task.
Finding and compiling the training corpus was facilitated – as it has been for other MT researchers – by Canadian bilingualism. The government service called Environment Canada must by law publish its weather reports several times a day in both the official languages, English and French, and it makes them available on the internet.
One of the novelties in TransNet was an input representation that accounted for word order (whereas earlier researchers had done so inside the network itself). Bruce was inspired by an approach IBM had used for speech recognition. It consisted of forming overlapping word triplets or trigrams as the units of translation. He explained how this was done and how it helped with disambiguation. But it turned out that his software and hardware could not cope with this; so instead he scaled back to bigrams instead of trigrams and also reduced the input corpus. He discovered 506 French bigrams and 465 English bigrams drawn from 400 English sentences and their 400 French translations.
Let’s turn now to the outcome. In brief, using the accuracy measurement that Bruce devised, the 10 testing sets had an average accuracy of 79.97%. But he would have benefited from a larger sample size although TransNet had over 230,000 connections. His conclusion:
Part of the motivation for this thesis was the difficulty of developing sufficiently flexible rules for translating natural language sentences. We thought it might be easier to have the computer do the work of analyzing text and then inducing rules for reproducing the text in another language. However, as it turns out this approach does not make research any easier: the emphasis shifts from linguistic analysis to building appropriate corpora, conditioning corpus text, developing data representations, designing network architectures, and building, training and testing networks… In short, the neural network approach is sadly not the lazy man’s substitute for morphological, syntactic, and semantic textual analysis. On the other hand, the onerous steps involved in implementing the neural network approach are mechanistic and automatable. Rule extraction is an art and thus inexact, error-prone, and incomplete.
This short post can do scant justice to a well-written thesis, though the latter is only of historical interest now. So much has happened since 1997. What was then an esoteric endeavour by a small coterie of enthusiasts with little or no funding has boomed into a multimillion sector of consumer products. Every day I receive in my mailbox advertising by the latest startup that is jumping on the bandwagon. I am grateful to Bruce that I learnt about neural networks and their potential for translation so early.
Bruce McHaffie. The application of neural networks to natural language translation. Advisors Brian Harris and Mario Marchand. Dissertation for the degree of Master of Arts in Translation, University of Ottawa School of Translation and Interpretation, 1997. https://ruor.uottawa.ca/bitstream/10393/4421/1/MO28444.PDF
XERION: natural neuron network simulator. CMU Artificial Intelligence Repository, https://www.cs.cmu.edu/Groups/AI/areas/neural/systems/xerion/0.html.
From Pisana Ferrari, Working at the intersection of linguistics and artificial intelligence to advance machine translation performance, 2019.