Some ethics for MT related to endangered languages

Let us sketch what could be some ethical requirements related to machine translation regarding endangered languages.

  • Perhaps a first requirement would be: don’t publish translation pairs regarding an endangered language until the success rate has reached at least 90%. Because instead of helping, it could harm the endangered language in question. For some people could publish these low quality translations, which could have the effect of depreciating the concerned endangered language. There is probably room for discussion here. For even below 90%, some translators could be helpful to some people. But to the very least, it could be suggested that a MT for a given endangered language should display its current success rate.
  • Another point that relates to ethics regarding endangered languages, could be the need for preserving the diversity that is inherent to a given endangered language. For most of them come in variants. Accordingly, we should take into account the main variants of endangered languages, and provide, as far as possible, translations into these main variants. There is recursivity of some kind in this process: if we are to enhance endangered languages in order to preserve language diversity, we should also take into account that diversity when concerned with a single language.

Posted in blog | Leave a comment

Word sense disambiguation: a hard case

Let us consider a hard case for word sense disambiguation, in the context of French to Corsican MT. But the same goes for French to English MT. It relates to French words such as: ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The corresponding verbs ‘accomplir’ (to fulfill, to accomplish), ‘affaiblir’ (to weaken), ‘affranchir’ (to free), ‘alourdir’ (to burden), ‘amortir’ (to damp) have the same word for simple present and simple past at the third person singular: respectively ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The upshot is that a single sentence such as: ‘Il affaiblit sa position.’ can be translated either into he weakens his position or into he weakened his position. If the context is unambiguous with regard to the sence of the discourse, the correct tense can be adequately chosen. But in the lack of informative context, it would be opportune to let the ambiguity prevail.

It should be pointed out that any such verbs are not rare. A more complete list includes: accomplit, affaiblit, affranchit, alourdit, amortit, anéantit, anoblit, aplatit, arrondit, assombrit, bannit, bâtit, blanchit, blondit, démolit, éblouit, emplit, enfouit, enhardit, enlaidit, ennoblit, envahit, épaissit, étourdit, exclut, franchit, glapit, investit, jaunit, jouit, munit, noircit, obéit, obscurcit, occit, périt, réagit, régit, réjouit, remplit, répartit, resplendit, rétrécit, rit, rougit, rouvrit, saisit, sévit, surgit.

.

Posted in blog | Tagged , , , , , | Leave a comment

More on grammatical type disambiguation

Let us focus on grammatical type disambiguation, which is a subproblem of word disambiguation. General grammatical types are: verbs, nouns, adjectives, adverbs, prepositions, gerundive, etc. But for grammatical type disambiguation purposes, more accuracy is in order: instances of grammatical types are then: masculine singular noun, feminine singular noun, masculine plural noun, feminine plural noun, masculine singular adjective, feminine singular adjective, masculine plural adjective, feminine plural adjective, adverbs, prepositions, gerundive, etc. Now grammatical type disambiguation can occur between two different grammatical types (in the above-mentioned form). For example, an ambiguity can occur between preposition and gerundive. In French, this is notably the case for ‘devant’ and ‘maintenant’. For ‘devant’ can either be an adverb (in front) or a gerundive (from the verb ‘devoir’, to have to). Similarly, ‘maintenant’ can either be an adverb (now) or a gerundive (from the verb ‘maintenir’, to maintain). It should be clear now that ‘devant’ and ‘maintenant’ are both ambiguous with regard to their grammatical type. In English, depending on the relevant grammatical type, ‘devant’ is ambiguous between having to or in front). In the same way, ‘maintenant’ is ambiguous between now and maintening.
In order to disambiguate French words ‘devant’ or ‘maintenant’, rule-based MT needs a disambiguation module that is able to distinguish whether ‘devant’ or ‘maintenant’ are adverbs or gerundives.

(not to mention the fact that ‘devant’ can also be a preposition, for the sake of clarity).

Posted in blog | Tagged , , | Comments Off on More on grammatical type disambiguation

New insight on the issue of pair reversal (updated)

The issue of pair reversal: it goes as follows: Suppose your have a given translation pair A>B that translates language A into language B, how hard is it to build the reverse pair B>A? Now the current instance of this problem goes as follows: given the French>Italian pair, how hard is it to build an Italian>French pair? To state it more explicitly : could AI help build a reverse pair in a very short time. Arguably, if AI could build such reverse pair shortly, it seems it would be some kind of breakthrough. Supposedly, we do not expect a 100% efficiency and accuracy in this reversal process, but if some 98% or 99% were possible, it would do the job. For AI within MT is not only targeted at translating, it is also targeted at constructing translation engines.

Just tested pair reversal from French-Italian to Italian-French. Well, some 70% can be made automatically, but a big issue is still remaining, that relates to the disambiguation of Italian words. The disambiguation engine seems to be the crux of the matter here. The uupshot is that the entire disambiguation module needs to be rewritten, in order (if possible) to be language-related. The new module must be more AI-focused. If successful, it could open the path to the (somewhat) fast construction of a multi-language ecosystem with a rule-based MT architecture.

Posted in blog | Tagged , , | Comments Off on New insight on the issue of pair reversal (updated)

How to translate ‘Cette phrase est en français’ ? (This sentence is in French) – updated

Let us consider the following French sentence: Le comté de Kronoberg est un comté suédois dont le nom signifie en français ‘Couronne de montagne’. It translates into Corsican: A cuntea di Kronoberg hè una cuntea svedese chì u so nome significheghja in francese ‘Curona di muntagna’. (The County of Kronoberg is a Swedish county whose name means in French ‘Mountain crown’.) But it should be translated more accurately as: A cuntea di Kronoberg hè una cuntea svedese situata in u sudu di u paese, è chì u so nome significheghja in corsu ‘Curona di muntagna’ since the words significheghja in francese (means in French) are utterly false.

Now a semantic difficulty is lurking whose core can be related to self-reference: How should we translate ‘Cette phrase est en français’ ? Self-reference stems here from ‘cette phrase’ (this sentence). Litterally, it translates into: This sentence is in French). But a sense-preserving translation would be: This sentence is in English).

A much complicated instance of self-reference within translation is as follows: ‘Cette phrase ne comprend que sept mots’ (This sentence contains only seven words). It translates into Corsican: ‘Ss’infrasata ùn cumprendi ch’è setti paroli. It is also true of the Corsican translation, but false of the English one, which includes only six words. Arguably, a better English translation, which is sense-preserving is then:
This sentence contains only six words. Such translation ability is currently beyond the scope of present MT. We can tag it as an ability that would be required from superintelligent MT. It would then include: identifying sef-referent parts of discourse, such as: this sentence, these words, this proposition, this paragraph, this text, … But not all self-referring discourse is concerned here. For example, the Liar paradox (this sentence is false) is irrelevant here, since we only place ourselves from the standpoint of MT. Interestingly, such superintelligent ability also requires some meta-knowledge, i.e. the language of the source text and of the target text. For a shift from the source language to the target language is needed here.

Posted in blog | Tagged , , , | Comments Off on How to translate ‘Cette phrase est en français’ ? (This sentence is in French) – updated

What is required from Artificial General Intelligence with regard to Machine Translation?

Illustration from www.pixabay.com

We will be interested in a series of posts to try to define what is required of an AGI (Artificial General Intelligence) in order to reach the level of superintelligence in MT (machine translation). (All this is highly speculative, but we shall give it a try.)
One of the difficulties that arise in machine translation relates to the translation of expressions. This leads us to mention one of the required skills of a superintelligence. It is the ability to identify an expression within a text in a given language and then to translate it into another language. Let us mention that expressions are of different types: verbal, nominal, adjectival, adverbial, … To fix the ideas we can focus here on verbal expressions. For example, the French expression ‘couper les cheveux en quatre’ (litterally, cut the hairs in four, i.e. to split hairs), which translates into Corsican language into either castrà i falchetti (litterally, to chastise the hawks) or castrà i cucchi (litterally, to chastise the cuckoos). In order to properly translate such an expression, a superintelligence must be able to:

  • identify ‘couper les cheveux en quatre’ as a verbal expression in a French corpus
  • identify castrà i falchetti as a verbal expression within a Corsican corpus
  • associate the two expressions as the proper translation of each other

It appears here that such an aptitude falls under the scope of AGI (Artificial general intelligence).

Posted in blog | Tagged , , , , | Comments Off on What is required from Artificial General Intelligence with regard to Machine Translation?

Follow-up to the ‘issue of pair reversal’ and first steps for Italian to Corsican

Here is a short follow-up to the ‘issue of pair reversal’ regarding language pairs. It seems some 90% could be achieved in this reversal process. What is lacking here is an adequate handling of disambiguation. Let us focus on one example. For it is patent in the above example, where Italian ‘venti’ is ambiguous between masculine plural noun (venti, wings) a numeral (vinti, twenty). But such specific ambiguity relating to grammatical types does not exit in French. The upshot is that disambiguation between grammatical types is specific to one given source language, at least in part. It this difficulty could be overcome, a rough 95% of the automatic process would finally be achieved.

(Obviously the current translation is not of an acceptable quality for publication: some 90% at least is in order…)

Anyway, handling sucessfully disambiguation in many languages appears to be the crux matter here. If AI could build sucessfully such disambiguation modules, it seems rule-based translation as a fast-growing ecosystem would be feasible.

Posted in blog | Comments Off on Follow-up to the ‘issue of pair reversal’ and first steps for Italian to Corsican

Superintelligent machine translation (updated)

Illustration from pixabay.com

Let us consider superintelligence with regard to machine translation. To fix ideas, we can propose a rough definition: it consists of a machine with the ability to translate with 99% (or above) accuracy from one of the 8000 languages to another. It seems relevant here to mention the present 8000 human languages, including some 4000 or 5000 languages which are at risk of extinction before the end of the XXIth century. It could also include relevantly some extinct languages which are somewhat well-described and meet the conditions for building rule-based translation. But arguably, this definition needs some additional criteria. What appears to be the most important is the ability to self-improve its performance. In practise, this could be done by reading or hearing texts. The superintelligent translation machine should be able to acquire new vocabulary from its readings or hearings: not only words and vocabulary, but also locutions (noun locutions, adjective locutions, adverbial locutions, verbal locutions, etc.). It should also be able to acquire new sentence structures from its readings and enrich its database of grammatical sentence structures. It should also be able to make grow its database of word meanings for ambiguous words and instantly build the associate disambiguation rules. In addition, it should be capable of detecting and implementing specific grammatical structures.
It seems superintelligence will be reached when the superintelligent translation machine will be able to perform all that without any human help.

Also relevant in this discussion is the fact, previously argued, that rule-based translation is better suited to endangered langages translation than statistic-based translation. Why? Because high-scale corpora do not exist for endangered languages. From the above definition of SMT, it follows that rule-based translation is also best suited to SMT, since it massively includes endangered languages (but arguably, statistic-based MT could still be used for translating main languages one into another).

Let us speculate now on how this path to superintelligent translation will be achieved. We can mention here:

  • a quantitative scenario: (i) acquire, fist, an ability to translate very accurately, say, 100 languages. (ii) develop, second, the ability to self-improve (iii) extend, third, the translation ability to whole set of 8000 human languages.
  • alternatively, there could be a qualitative scenario: (i) acquire, first, an ability to translate somewhat accurately the 8000 languages (the accuracy could vary from language to language, especially with rare endangered languages). (ii) suggest improvements to vocabulary, locutions, sentence structures, disambiguation rules, etc. that are verified and validated by human (iii) acquire, third, the ability to self-improve by reading texts or hearing conversations.
  • it is worth mentioning a third alternative that would consist of  an hybrid scenario, i.e. a mix of quantitative and qualitative improvements. It will be our preferred scenario.

But we should provide more details on how these steps could be achieved. To fix ideas, let us focus on the word self-improvement module: it allows the superintelligent machine translation to extend its vocabulary in any language. This could be accomplished by reading or hearing new texts in any language. When facing a new word, the superintelligent machine translation (SMT, for short) should be able to translate it instantly into the 8000 other languages and add it to its vocabulary database.

To give another example, another module would be locution self-improvement module: it allows the superintelligent machine translation to extend its locution knowledge in any language.

Also relevant to this topic is the following question: could SMT be achieved without AGI ( general AI)? We shall address this question later.

Posted in blog | Tagged , , , , , , , , , , , | Comments Off on Superintelligent machine translation (updated)