Priority pairs for endangered languages

Let us discuss the question of priority pairs with regard to endangered languages. It consists of the most wanted translation pairs for a given endangered language, in keeping with the main language with which it is associated. To take an example: French-Corsican is the priority pair for Corsican language. In the same way, Italian-Gallurese is the priority pair for Gallurese language, etc. Now expanding on that idea, priority pairs are:

  • Corsican: (i) French-Corsican (ii) Italian-Corsican (iii) English-Corsican
  • Sardinian Gallurese: (i) Italian-Gallurese (ii) English-Gallurese
  • Sardinian Sassarese: (i) Italian- Sassarese (ii) English-Sassarese
  • Sardinian Logodurese: (i) Italian-Logodurese (ii) English-Logodurese
  • Sicilian: (i) Italian- Sicilian (ii) English-Sicilian
  • Manx: (i) English-Manx
  • Munegascu: (i) French-Munegascu (ii) Italian-Munegascu (iii) English-Munegascu
Posted in blog | Tagged , | Leave a comment

More on two-sided grammatical analysis

Let us give some further examples of two-sided grammatical analysis:

  • “à dessein” (purposedly), “à volonté” (at will), “à tort” (mistakenly): from an analytical standpoint, these are prepositions followed by a singular noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).
  • “à jamais” (forever): from an analytical standpoint, it is a preposition followed by an adverb. From a synthetical viewpoint, it is an adverb (adverbial locution).
  • “à genoux” (on my/his/her/… knees), “à torrents” (in torrents): from an analytical standpoint, these are prepositions followed by a plural noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).
Posted in blog | Tagged , , | Leave a comment

Two-sided grammatical analysis

Let us call two-sided grammatical analysis the type of grammatical analysis that will be described below. Two-sided grammatical analysis contrasts with one-sided analysis, which sees a sequence of words either as a locution type (adverbial locution, verbal locution, noun locution, etc.) or as the sequence of types of it constituent words. From the standpoint of two-sided grammatical analysis, a given sequence of words can be attributed one (synthetically) single type, and (analytically) several grammatical types corresponding one-by-one to its constituent words. The upshot is that a given sequence of words can be described from two – synthetic & analytic – different viewpoints. What is now the status of ‘de fait’, from the viewpoint of ‘two-sided grammatical analysis’? From a synthetic standpoint, it is an adverb. And from an analytic viewpoint, it is made up of one preposition (‘de’) followed by a common noun (‘fait’). Both viewpoints are complementary and cast each light on one facet of the same reality. (lacking the time to write a scholar article, but I hope the main idea should be clear…)

Posted in blog | Tagged , , | Leave a comment

A hard case for disambiguation: polymorphic disambiguation

Let us investigate an issue that relates to disambiguation. It is a hard case that needs to be addressed: I shall call it in what follows, for reasons that will become clearer later, polymorphic disambiguation. Let us take an example. It relates to the translation of the two consecutive words: ‘de fait’. The first French sentence ‘De fait, il part.’ translates into Difatti, parti‘ (Actually, he’s leaving.): in this case, ‘de fait’ is considered as an adverbial locution. The second French sentence ‘Il n’y a rien de fait. translates correctly into Ùn ci hè nienti di fattu. (There is nothing done.) where ‘fait’ is now identifed as a participe. The instance at hand concerns French to Corsican, but it should be clear that it arises in the same way within French to English translation. To sum up: the two consecutive words ‘de fait’ can be identifed either as an adverbial locution, or as a preposition (‘de’) followed by a participe (‘fait’, done).

Now we are in a position to formulate the problem in a more general way. It concerns two or more consecutive words, that may be grammatically interpreted differently in the sentence and that may, thus, be translated in a different way. Generally speaking, disambiguation may concern one word (in most cases) but also a group of words. Now polymorphic disambiguation relates then to a given groups of words, i.e. sequences of 2-words, 3-words, 4-words, etc.

A try with online translators shows that statistical MT does better with polymorphic disambiguation. That is truly an interesting difference. So it is a gap that should be filled for rule-based MT.

Posted in blog | Tagged | Leave a comment

Some ethics for MT related to endangered languages

Let us sketch what could be some ethical requirements related to machine translation regarding endangered languages.

  • Perhaps a first requirement would be: don’t publish translation pairs regarding an endangered language until the success rate has reached at least 90%. Because instead of helping, it could harm the endangered language in question. For some people could publish these low quality translations, which could have the effect of depreciating the concerned endangered language. There is probably room for discussion here. For even below 90%, some translators could be helpful to some people. But to the very least, it could be suggested that a MT for a given endangered language should display its current success rate.
  • Another point that relates to ethics regarding endangered languages, could be the need for preserving the diversity that is inherent to a given endangered language. For most of them come in variants. Accordingly, we should take into account the main variants of endangered languages, and provide, as far as possible, translations into these main variants. There is recursivity of some kind in this process: if we are to enhance endangered languages in order to preserve language diversity, we should also take into account that diversity when concerned with a single language.

Posted in blog | Leave a comment

Word sense disambiguation: a hard case

Let us consider a hard case for word sense disambiguation, in the context of French to Corsican MT. But the same goes for French to English MT. It relates to French words such as: ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The corresponding verbs ‘accomplir’ (to fulfill, to accomplish), ‘affaiblir’ (to weaken), ‘affranchir’ (to free), ‘alourdir’ (to burden), ‘amortir’ (to damp) have the same word for simple present and simple past at the third person singular: respectively ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The upshot is that a single sentence such as: ‘Il affaiblit sa position.’ can be translated either into he weakens his position or into he weakened his position. If the context is unambiguous with regard to the sence of the discourse, the correct tense can be adequately chosen. But in the lack of informative context, it would be opportune to let the ambiguity prevail.

It should be pointed out that any such verbs are not rare. A more complete list includes: accomplit, affaiblit, affranchit, alourdit, amortit, anéantit, anoblit, aplatit, arrondit, assombrit, bannit, bâtit, blanchit, blondit, démolit, éblouit, emplit, enfouit, enhardit, enlaidit, ennoblit, envahit, épaissit, étourdit, exclut, franchit, glapit, investit, jaunit, jouit, munit, noircit, obéit, obscurcit, occit, périt, réagit, régit, réjouit, remplit, répartit, resplendit, rétrécit, rit, rougit, rouvrit, saisit, sévit, surgit.

.

Posted in blog | Tagged , , , , , | Leave a comment

More on grammatical type disambiguation

Let us focus on grammatical type disambiguation, which is a subproblem of word disambiguation. General grammatical types are: verbs, nouns, adjectives, adverbs, prepositions, gerundive, etc. But for grammatical type disambiguation purposes, more accuracy is in order: instances of grammatical types are then: masculine singular noun, feminine singular noun, masculine plural noun, feminine plural noun, masculine singular adjective, feminine singular adjective, masculine plural adjective, feminine plural adjective, adverbs, prepositions, gerundive, etc. Now grammatical type disambiguation can occur between two different grammatical types (in the above-mentioned form). For example, an ambiguity can occur between preposition and gerundive. In French, this is notably the case for ‘devant’ and ‘maintenant’. For ‘devant’ can either be an adverb (in front) or a gerundive (from the verb ‘devoir’, to have to). Similarly, ‘maintenant’ can either be an adverb (now) or a gerundive (from the verb ‘maintenir’, to maintain). It should be clear now that ‘devant’ and ‘maintenant’ are both ambiguous with regard to their grammatical type. In English, depending on the relevant grammatical type, ‘devant’ is ambiguous between having to or in front). In the same way, ‘maintenant’ is ambiguous between now and maintening.
In order to disambiguate French words ‘devant’ or ‘maintenant’, rule-based MT needs a disambiguation module that is able to distinguish whether ‘devant’ or ‘maintenant’ are adverbs or gerundives.

(not to mention the fact that ‘devant’ can also be a preposition, for the sake of clarity).

Posted in blog | Tagged , , | Comments Off on More on grammatical type disambiguation

New insight on the issue of pair reversal (updated)

The issue of pair reversal: it goes as follows: Suppose your have a given translation pair A>B that translates language A into language B, how hard is it to build the reverse pair B>A? Now the current instance of this problem goes as follows: given the French>Italian pair, how hard is it to build an Italian>French pair? To state it more explicitly : could AI help build a reverse pair in a very short time. Arguably, if AI could build such reverse pair shortly, it seems it would be some kind of breakthrough. Supposedly, we do not expect a 100% efficiency and accuracy in this reversal process, but if some 98% or 99% were possible, it would do the job. For AI within MT is not only targeted at translating, it is also targeted at constructing translation engines.

Just tested pair reversal from French-Italian to Italian-French. Well, some 70% can be made automatically, but a big issue is still remaining, that relates to the disambiguation of Italian words. The disambiguation engine seems to be the crux of the matter here. The uupshot is that the entire disambiguation module needs to be rewritten, in order (if possible) to be language-related. The new module must be more AI-focused. If successful, it could open the path to the (somewhat) fast construction of a multi-language ecosystem with a rule-based MT architecture.

Posted in blog | Tagged , , | Comments Off on New insight on the issue of pair reversal (updated)