Wednesday, May 3, 2023

A Chatbot Experiment

 



In the previous post we lauded some of the wonders of which the new generation of chatbots is capable. In this post it we will look at some of the drawbacks.

 

To be fair, It must be said that the home screen of ChatGPT displays a whole list of caveats: not up to date by several years, may occasionally produce biased content, may give information that is plain wrong, etc. The internet is buzzing with examples. Rather than make an invidious selection, I decided to conduct my own experiment. Here’s a foretaste.

 

There’s is a long-established Irish-Spanish family in the city of Valencia, Spain, where I Iive, which played a prominent role in its modernisation in the nineteenth and early twentieth centuries. The family name is Trenor. I’d heard of them but wanted to know more. To my consternation, the answer came back from ChatGPT that it had no information about the family. The answer is all the harder to understand because there’s a long article about the Trenors in Wikipedia! True ChatGPT adds, “I cannot access up-to-date information on specific families or individuals without violating their privacy;” but I didn’t specify “up to date.”

 

Undaunted, I decided to try a second experiment, this time on a topic that should be more familiar to regular readers of this blog and so make it easier for them to judge the result.. The question was, “Who first proposed the natural translation hypothesis.” This time I did get a meaningful response – only it was a wrong one. It was that the inventor was Eugene Nida. Nida was an immensely influential translatologist; there was a time, when I was beginning in translation studies, that no thesis was complete  without a reference to him. And it’s true that Nida recommended the use of natural-sounding target language. However, he did not formulate a natural translation hypothesis and he used natural in quite a different sense. Why then did ChatGPT make this mistake? We can only speculate, because chatbots are black boxes whose workings are not revealed by their designers if they even understand them themselves. It could not be for lack of data, since a cursory search with Google finds lots of references to the hypothesis. So a more likely explanation is failure to apply the longest match principle. The principle states that when seeking to match strings of characters or words, only the longest match is acceptable. Could it be that ChatGBT mistakenly stopped seeking after translation and missed hypothesis?

 

The next thing that happened was my mistake. I pressed a wrong keyboard key and as a result the previous question was repeated. Again I soon got an answer – but it was not the same one! It was this:

“The Natural Translation Hypothesis was proposed by Gideon Toury, an Israeli scholar and translation theorist, in his influential book "Descriptive Translation Studies and Beyond" published in 1995. Toury's Natural Translation Hypothesis suggests that translations are influenced by various factors such as the translator's language competence, the target culture, and the translation norms of the time, which shape the translation process and result in translations that are shaped by the target language and culture, rather than being an exact replica of the source text. Toury's work has had a significant impact on the field of translation studies, and the Natural Translation Hypothesis is considered a fundamental concept in translation theory.”

This is much better. In fact Toury was very sympathetic to the hypothesis, as I know from personal correspondence with him. However, he had his own model of how the translation ability develops and he had his own term for it, which was native translation. And he never called it a hypothesis.

Notwithstanding the disappointments, I decided to have one last try. By this time there were already imitators and competitors of ChatbotGPT on the market, so I turned to one of them, BingAI. Here’s its equally fast response:

"Brian Harris is credited with proposing the Natural Translation Hypothesis. According to Harris, natural translation is a way of translating that occurs when a person translates without any formal training or knowledge of translation theory. The hypothesis suggests that natural translation is a more effective way of translating than traditional methods. Harris' work has been influential in the field of translation studies and has led to further research into natural translation."

This is a correct answer to the query, though the hypothesis doesn't suggest that natural translation is “more effective.” It seems that BingAI is capable of going beyond its data and making things up into what are now called hallucinations.

A single example isn’t enough to declare one chatbot superior to another. However, if there’s a lesson we can learn from it it's not to rely on only one chatbot. If there is the slightest doubt, try another.

 

 

Sources

ChatGPT. https://chat.openai.com

BingAI. https://bing.com/chat.

Friday, March 24, 2023

ChatGPT on Child Translators and Bilingualism

 



 

If you haven’t heard of ChatGPT you must have been sleeping for the past two months. Never has software or firmware caused such a kerfuffle. There’s so much  about ChatGPT and the like – let’s call them generically Chats – which are fed by vast amounts of stored texts and from which they extract and reformulate output. This is what it seems to me that I do when I formulate my texts. The notion that we generate sentences by building a combination of syntax and vocabulary is greatly over-simplified. Most of the time we draw on phraseology we have previously read or heard, that is to say most of our language output is built up of prefabricated elements strung together. It’s not so much generative as imitative, although we have the advantage over the Chats for now that we can check and postedit as we go along. This is important because Chats reproduce errors in their input: as the old saying goes, garbage in, garbage out.

 

Use of the Chats is arousing much controversy. Some schools and universities have banned them and are even going back to pencil and paper for exams. Other people see them as an educational aid. They are a particular boon to students whose first language is not English, because they make few language errors.

 

So on the principle that the proof of the pudding is in the eating, I decided to try ChatGPT out on a couple of topics that are bound to be of interest to readers of this blog. The following are the results. 

 

Q: Can children do translation?

Chat: Yes, children can do translation, however, their ability to translate accurately and fluently will

depend on many factors such as their age, language proficiency, and level of education. Children who are exposed to multiple languages from an early age tend to develop better language skills, including the ability to translate. However, it is important to note that translation is a complex task that requires a deep understanding of the source and target languages, as well as cultural knowledge and context.

 

Q: How many languages can a person learn?

Chat: A person can learn as many languages as they have time, resources, and motivation to learn. There is no theoretical limit to the number of languages a person can learn. However, the number of languages that a person can be fluent in or near-native in is often limited by their cognitive and linguistic abilities, as well as the amount of time and effort they are able to put into learning each language.

So far so good. A student who gave those replies in an exam could hardly be failed, although references are lacking. Now let’s try something that requires more thought.

 

 

Write one page about the advantages of being bilingual.

Chat: Being bilingual has numerous advantages that can positively impact an individual's personal and professional life.

Firstly, bilingual individuals have a competitive advantage in the job market. Many employers value bilingual employees, as they can expand the company's customer base, enhance communication with clients, and facilitate business opportunities in other countries. Bilingual employees are often sought after for roles in customer service, marketing, translation, and other fields that require language skills.

Additionally, being bilingual has cognitive benefits. Studies have shown that speaking two languages can enhance problem-solving and decision-making skills, improve memory, and delay the onset of dementia and Alzheimer's disease. Bilingual individuals often have to switch between two languages, which can help improve cognitive flexibility and multitasking abilities.

Furthermore, being bilingual can also lead to cultural competence and a better understanding of other cultures. Learning another language opens up new perspectives and can broaden an individual's world view. Bilingual individuals often have a better appreciation for cultural diversity and are able to communicate effectively with people from different backgrounds.

Moreover, being bilingual can enhance personal relationships and travel experiences. Knowing another language allows individuals to converse with people they may not have been able to communicate with otherwise, making it easier to form new relationships and connect with others. When traveling, knowing a second language can make navigating a foreign country much easier and more enjoyable.

In conclusion, being bilingual has numerous advantages that can enhance an individual's personal and professional life. It can lead to a competitive advantage in the job market, cognitive benefits, cultural competence, and improved personal relationships and travel experiences. The benefits of being bilingual are numerous and should be considered by anyone looking to expand their skills and opportunities.

 Now write one page in French about the advantages of being bilingual.

Chat: Les avantages d'être bilingue

Être bilingue signifie être capable de parler deux langues couramment et avec aisance. Cela peut apporter de nombreux avantages, tant sur le plan personnel que professionnel.

Tout d'abord, être bilingue peut améliorer les compétences linguistiques en général. Apprendre une deuxième langue peut renforcer la compréhension et la maîtrise de la première langue. De plus, cela peut également ouvrir la voie à l'apprentissage de langues supplémentaires, car les compétences linguistiques acquises dans l'apprentissage de la première langue peuvent faciliter l'apprentissage d'autres langues.

Sur le plan personnel, être bilingue peut enrichir la vie en permettant de communiquer avec un plus grand nombre de personnes. Cela peut également renforcer les liens avec les amis et la famille qui parlent une autre langue, ainsi que faciliter les voyage dans des pays où la deuxième langue est parlée.

Enfin, être bilingue peut également apporter des avantages professionnels considérables. Les employeurs sont souvent très intéressés par les employés bilingues, car cela signifie qu'ils peuvent communiquer avec un plus grand nombre de clients et de fournisseurs. De plus, les compétences linguistiques peuvent également être un atout pour les opportunités de carrière dans les domaines internationaux, tels que le commerce, la diplomatie, ou le tourisme.

En résumé, être bilingue peut améliorer les compétences linguistiques, enrichir la vie personnelle, et apporter des avantages professionnels considérables. Il est donc fortement recommandé d'apprendre une deuxième langue pour profiter de tous ces avantages.

Notice that the French is not an exact translation of the English so it must have been composed separately, perhaps from different data.

 

More soon on this topic.

Friday, February 3, 2023

Bruce McHafffie, Pioneer of Artificial Intelligence in Machine Translation



  

This post is intended as a tribute to one of my students, Bruce McHaffie, whose pioneering thesis has been relegated to the oblivion that is the fate of so many MA theses. Bruce was my last student before I left Canada for Spain in the late 1990s. At that time he was working at the ill-fated Canadian telecommunications giant Nortel. As I was not competent to advise on or assess the computing aspect of his work, I turned to Mario Marchand, a professor in the computer science department, to be co-advisor.

 

It was only an MA thesis, but it could have been the basis for a doctorate. It is best summarised in Bruce’s own abstract:

 

The characteristics of automated learning and generalization, and of graceful degradation in the face of unforeseen input, give neural networks interesting potential for machine translation (MT). However, the field of connectionist MT has been little explored by researchers. This thesis provides an introduction to neural network concepts and summarizes and reviews the research in the field of connectionist MT. It describes the building, training and testing of TransNet, an embryonic neural network that translates weather reports from English into French. TransNet uses an innovative bigram (word pair) data representation which makes it possible to take some account of word order in the processing.

 

This a case where I must unashamedly say that I learnt more from my student than my student learnt from me; for the first chapters of the thesis provide an excellent, clear introduction to connectionist (i.e. neural network) computing, and if you want to learn about it, even today, you would do well to start there. He dealt with the training of networks as well as their architecture.

 

He did not claim to be the first to think of using AI for MT. In the second part of the thesis he reviewed the tentative attempts at it within the application of AI to natural language processing more generally. His conclusion was this:


Overall, given the relative youth of neurocomputing, a surprising variety of approaches has been used to attack natural language translation. Nonetheless, on first sight at least two significant areas of investigation have been overlooked. First, no language-to-language network has been trained on a real-world corpus of any size; and second, none have been  developed specifically to operate on domain-specific texts (thereby restricting vocabulary size naturally). The network developed as part of this thesis attempts to address both oversights.

 

Now we get to chapter 6, which describes TransNet, “a neural network designed to translate natural language weather reports from English into French.” The first caveat to add is that it is not intended to be practical, marketable software. It is no more than a ‘proof of concept’, that is to say a pilot project which is executed to demonstrate that an idea is feasible. In this case the output was actually pointless, since the input was Canadian official weather report texts that were already routinely translated by an existing non-connectionist MT system. However, the duplication had an advantage since it provided a standard by which to judge TransNet’s output. To be considered successful, TransNet’s output had to be at least as acceptable as that of the existing system. Notice that this is an MT to MT comparison, not MT to human translation.

 

The main objectives were to train a network using a relatively large natural language corpus (weather reports) with an interesting vocabulary (meteorology) and to account for word order in the data representation for input.

 

Bruce’s accomplishment has to be considered in the context of the very limited resources that were available to him. He had no funds to build (or employ others to build) his own dedicated networker. He had to work with free off-the-shelf software called Xerion from the University of Toronto that dated back to 1992. At times it was not up to the task.

 

Finding and compiling the training corpus was facilitated – as it has been for other MT researchers – by Canadian bilingualism. The government service called Environment Canada must by law publish its weather reports several times a day in both the official languages, English and French, and it makes them available on the internet.

 

One of the novelties in TransNet was an input representation that accounted for word order (whereas earlier researchers had done so inside the network itself). Bruce was inspired by an approach IBM had used for speech recognition. It consisted of forming overlapping word triplets or trigrams as the units of translation. He explained how this was done and how it helped with disambiguation. But it turned out that his software and hardware could not cope with this; so instead he scaled back to bigrams instead of trigrams and also reduced the input corpus. He discovered 506 French bigrams and 465 English bigrams drawn from 400 English sentences and their 400 French translations.

 

Let’s turn now to the outcome. In brief, using the accuracy measurement that Bruce devised, the 10 testing sets had an average accuracy of 79.97%. But he would have benefited from a larger sample size although TransNet had over 230,000 connections. His conclusion:


Part of the motivation for this thesis was the difficulty of developing sufficiently flexible rules for translating natural language sentences. We thought it might be easier to have the computer do the work of analyzing  text and then inducing rules for reproducing the text in another language. However, as it turns out this approach does not make research any easier: the emphasis shifts from linguistic analysis to building appropriate corpora, conditioning corpus text, developing data representations, designing network architectures, and building, training and testing networks… In short, the neural network approach is sadly not the lazy man’s substitute for morphological, syntactic, and semantic textual analysis. On the other hand, the onerous steps involved in implementing the neural network approach are mechanistic and automatable. Rule extraction is an art and thus inexact, error-prone, and incomplete.

 

This short post can do scant justice to a well-written thesis, though the latter is only of historical interest now. So much has happened since 1997. What was then an esoteric endeavour by a small coterie of enthusiasts with little or no funding has boomed into a multimillion sector of consumer products. Every day I receive in my mailbox advertising by the latest startup that is jumping on the bandwagon. I am grateful to Bruce that I learnt about neural networks and their potential for translation so early.

 

Sources

Bruce McHaffie. The application of neural networks to natural language translation. Advisors Brian Harris and Mario Marchand. Dissertation for the degree of Master of Arts in Translation, University of Ottawa School of Translation and Interpretation, 1997. https://ruor.uottawa.ca/bitstream/10393/4421/1/MO28444.PDF

 

XERION: natural neuron network simulator. CMU Artificial Intelligence Repository, https://www.cs.cmu.edu/Groups/AI/areas/neural/systems/xerion/0.html.

 

Image

From Pisana Ferrari, Working at the intersection of linguistics and artificial intelligence to advance machine translation performance, 2019.

Thursday, January 19, 2023

The Arabic Interpreter Who Saved A Thousand Jews

 


Si Kaddour Benghabrit

 

In January 2019, in a post on this blog celebrating 100 years of conference interpreting, there was a mention of the 1906 Algeciras Conference because it was the first major international conference to use an Arabic interpreter. (To retrieve the post, enter algeciras in the Search box on the right.) That was some 70 years before there were Arabic interpreters at the United Nations. The conference was convened by the major European powers and the United States to ratify European intervention in nearby Morocco, only 16 nautical miles away directly across the Straits of Gibraltar to Tangier. The Germans wanted it but eventually the other powers ganged up on them and awarded it to the French. The conference was a long one; it lasted It from January to April. It might have been conducted in French, the standard diplomatic language of the period, had it not been that a key delegate, the Moroccan Vizier Mohammed Ben Abdelsalem El-Mokri, and likewise his companions, only spoke Arabic, so they needed an interpreter. Luckily one was found not far away. He was Elie Cohen from the thriving Jewish community in Tangier. (There were still remnants of the community, mostly old people, when I was teaching in Tangier in the 1980s. Tangier is an Arabic-French-Spanish trilingual city.)  Elie was perhaps the first modern Arabic conference interpreter. You can still stay for a reasonable price at the beautiful Reina Cristina Hotel in Algeciras where the conference took place, an oasis amidst the modern developments of a large container port. When I visited it in 2000 there was a photo of Elie in the hallway together with his visiting card.

 

Why Algeciras? Obviously its proximity to Morocco and its ferry to Tangier but there were other reasons. Algeciras in the early twentieth century was a resort township that catered especially for the British garrison at Gibraltar, through which it could be reached easily by sea. It had a rail connection to inland Spain and hence to the rest of Europe. And the Reina Cristina was no ordinary hotel. Its architect, Thomas Edward Collcutt, had been the architect of the prestigious Savoy Hotel in London.

 

And why was the USA invited? To understand this one must know about the long history of American relations with Morocco. They go back to the late eighteenth century and the need for Moroccan cooperation to combat piracy. As a result, the old American consulate in the medina of Tangier is the oldest American diplomatic building outside of the United States.

 

However, I made one important mistake in the 2019 post. I wrote as if Elie Cohen was the only Arabic interpreter at the conference. I now know that there was also a second Arabic interpreter; and that much more is known about the second interpreter than about Elie Cohen. His name was Abdelkader Ben Ghabrit but he was better known as Si Kaddour Benghabrit. (Si is a dialectal abbreviation of Sidi, a title of respect.)

 

There are substantial biographies of Benghabrit in Wikipedia and elsewhere, so I will confine myself to two aspects of his life: his work as an interpreter and his relationship with Jews.


He was well prepared to be a conference interpreter, both linguistically and culturally. In his day his country Algeria was under French rule. He received the typical education of the son of a Muslim notable in the Maghreb at the madrasa (local school), memorizing the Koran and learning classical Arabic (the language of the Koran, which is very different from Algerian Arabic). At the same time, he also received an education that reflected the ideology of France's mission civilatrice (‘civilising mission’) under which France would ‘civilise’ the Algerians by assimilating them into the French language and culture. Benghabrit became a Francophile who embraced the ideal of France's mission civilisatrice in Algeria, and as such he was deeply loyal to France and its values.

 

After studying at the University of al-Karouine at Fez in Morocco, he started his career in Algeria in the judiciary. As a cosmopolitan, sophisticated man, able to straddle two very different cultural worlds, he was able to make himself useful to those who held power. In 1892, he became assistant interpreter at the Legation of France to Tangier; he served as a liaison between North African officials and the French  Ministry of Foreign Affairs. He served as chief of the French Legation in Tangier in the period 1900-1901. Benghabrit was fluent in Moroccan Arabic (which is closely related to Algerian Arabic), which gave the French legation an immense advantage over the legations to  Morocco who lacked personnel capable of speaking Moroccan Arabic, and furthermore he was always well informed about Moroccan affairs. The French diplomat Charles de  Beaupoil rated Benghabrit as one of the most ablest dip lomats he had ever worked with, and as the most able in Morocco. By then Benghabrit held a position in Morocco in the court of the sultan as an unofficial French diplomat.


Sultan Abdelaziz, the ruler of Morocco, was represented at Algeciras by his Vizier Muhammed al-Muqri. Al-Muqri expressed frustration at the translation situation and commented: "We're sitting here like statues; we can't understand a thing of what is said. The Moroccan delegation had no choice but to use Benghabrit although he was officially there in the service of France.

 

In 1912, he interpreted the negotiations between Sultan Abd al-Hafid of Morocco and the French diplomat Eugène Regnault which culminated in the signing of the Treaty of Fes, and which established the French Protectorate of Morocco. French Resident General Lyautey then rewarded him with a position as head of protocol to the sultan.

 

At this point I must skip a decade and the First World War and fast-forward to the 1920s.


It was then that the French government decided to construct a mosque in Paris to symbolise the eternal friendship of France and Islam, and memorialise the sacrifice of the tens of thousands of Muslim soldiers who died fighting in support of France during World War 1. Thus the building known as the Great Mosque of Paris was completed in 1926. At the opening of the Great Mosque, Benghabrit in his speeches in both French and Arabic praised the "eternal union" of France and Islam. He was the  natural choice to be its first head (Imam).

 

Now we must fast-forward again, this time to WW2. By 1940 the Germans were masters of Paris and the Vichy French government was beginning to collaborate with their antisemitic campaigns. These went against Benghabit’s personal feelings and also against the Koran, which accords protected status to the Jews. He took the dangerous steps of helping French Jews in two ways. One was to use his religious authority to issue them with false papers certifying that they were Muslims. The other was to hide Jews, along with some Allied and Resistance escapees, in the Great Mosque itself. The story goes that he would hide them in a section of the mosque that he declared was reserved for women, and then he would prohibit any visiting Germans, being all men, from entering it.

 

It will never be known for sure how many he saved, because most of them had false papers. Estimates vary between 500 and 1,500. Anyway a lot.

 

For his contributions, Benghabrit was awarded the Grand Cross of the Legion of Honour. He is buried in a reserved area to the North of the Mosque. The Bâtisseurs de Paix, an association of Jewish and Muslim women working for inter-community harmony, submitted a petition in 2005 to the Council of Vad Yashem [The World Holocaust Remembrance Center] to recognise that the Mosque of Paris saved many Jews between 1942 and 1944, and that Vad Yashem should thus recognise Si Kaddour Benghabrit as one of the Righteous Among the Nations. Alas, this request remains unfulfilled, as no survivors have been found because of the false passports.

 

In the perspective of the history of interpreting, Benghabrit was a distinguished member of the lineage of French interpreter-diplomats that started under Colbert in the 17th century.

 

 

Sources

Algeciras Conference. Wikipedia,2023. There is a photo of the conference in session in the earlier post on this blog.

 

Treaty of Fes [sic]. Wikipedia, 2023.

 

Si Kaddour Benghabit. Wikipedia, 2022.

 

Grande mosque of Paris. Wikipedia, 2023.