Further reflexions on the status of “I love you” in Corsican language

Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?

Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.

Posted in blog | Tagged , , | Leave a comment

Update to priority pairs for endangered languages

If we were to update the priorities for language pairs to be achieved, from the point of view of endangered languages, the result would be as follows:

  • Corsican language: French to Corsican (already done)
  • Sardinian Gallurese: Italian to Gallurese
  • Sardinian Sassarese: Italian to Sassarese
  • Sicilian: Italian to Sicilian: sicilian language is close to Corsican sartinesu or taravesu
  • Munegascu: French to Munegascu: munegascu language bears some similarities with Corsican language

Pairs such as French to Gallurese, French to Sassarese, English to Gallurese, English to Sassarese, English to Sicilian do not have priority, as they can be resolved using an intermediate pair. French to Gallurese is done with the French to Italian pair (e.g. with Deepl) and then with the Italian to Gallurese pair, etc.

Posted in blog | Tagged | Leave a comment

The enigmatic grammatical status of “I love you” in Corsican language

Translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed translation ‘ti tengu caru/cara‘, whose (difficult) disambiguation must be done according to the context.

It is worth sketching a few ideas, in order to get some insight into this issue. First of all, let’s look at the problem synthetically. This underlines the problem inherent in the grammatical status of the sentence ‘je t’aime’ (I love you) in French or in English, as it is not known whether it is addressed to a male or a female person. If one were to assign a gender to this sentence, it would therefore be masculine or feminine, with an inherent ambiguity. Assigning in some way a gender – masculine or feminine – to a sentence may seem strange prima facie, but it could prove useful (to be confirmed) In this case, the gender associated with the sentence would be inherited from the pronoun ‘t’ (short for ‘te’) which remains unambiguated with the sentence ‘je t’aime’ (I love you, ti tengu caru/cara) alone.

Second, let’s look at the issue from an analytical perspective. For another way to solve the problem could be to assign a reference to the pronoun ‘te’ (you). The latter could be identified according to the context. This sounds more promising and more in line with the well-known problem of pronoun resolution.

Posted in blog | Tagged , | Leave a comment

The taxonomy optimization problem

Let us add further reflexions on the remaining 1% problem. As hinted at previously, the remaining 1% problem may only be solved by general AI (GAI). Let us sketch in a series of posts what features are required for general AI in this context. On feature of GAI would be the ability to solve the ‘taxonomy optimization problem’. Let’s focus on defining it (very roughly, to begin with). Let us consider a given language, defined with a certain number of words, and a corpus of sentences (or a set of rules to define licit sentences in this language). In this context, the ‘taxonomy optimization problem’ is the question of deciding what is the simplest taxonomy with its associated rules to resolve the type ambiguities existing in this language? This feature of GAI would be notably capable of defining the best taxonomy for resolving type ambiguities existing within this language. And it is possible that such a feature of GAI would revolutionize grammar and our present grammatical taxonomy.

Posted in blog | Tagged , | Leave a comment

One hundred users for the Traduttore corsu app for Android

The Traduttore corsu application for Android has now more than a hundred users. Moving on…

Posted in blog | Leave a comment

More on polymorphic disambiguation…

Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different ways. In the sequence ‘mais nombre de poissons sont longs’ (but many fish are long), ‘number of’ is an indefinite determiner: it translates as bon parechji (many). On the other hand, in the sequence ‘mais le nombre de poissons est supérieur à dix’ (but the number of fish is greater than ten), ‘nombre de’ is a common name followed by the preposition ‘de’: it is translated by numaru di (number of). Statistical MT does usually better than human-like (rule-based) MT at polymorphic disambiguation (I did a test with both sentences with Deepl and Google translate, and both of them successfully solve the relevant polymorphic disambiguation), but it turns out that human-like (rule-based) MT is also capable of handling that.

Posted in blog | Tagged , , , | Leave a comment

Performing our first open test of the year

Let us comment on the remaining errors encountered in the above open test:

  • French ‘carrière’ remains undisambiguated: either carriera (career) or cava (quarry): two occurrences
  • ‘de’: French ‘de’ is perhaps the most difficult word to translate into another language, due to its general polymorphism
  • ‘national-socialiste’: missing vocabulary
  • l’ within ” l’empeche “: pronoun error
  • it should be pointed out that ‘Etats-Unis’ remains untranslated due to the fact that it is erroneously written, with a beginning E instead of É

The result is 1 – (5/169) = 97.04%. To be noticed: ambiguous French word ‘partie’ (‘durant la première partie’, during the first part) is correctly disambiguated into parti (part), instead of partita (game, match).

It seems that an average result of 95% is currently being consolidated, and that an average result of 96% is a target that should be achievable within a year.

Posted in blog | Tagged , , , | Leave a comment

More on the remaining 1% problem

The analysis of the Wikipedia article of the day in French is interesting, in the sense that it sheds light on the skills that will be necessary for a machine translation system to achieve a 100% accurate translation. The error that appears here is characteristic and must probably be placed in the missing 1% to achieve 100% accuracy in the translation (the problem of the remaining 1%). The phrase ‘Her father studied at the University of Oregon and then at Yale Law School‘ has a definite article with elision: l’. The translation given (u/a, i.e. indeterminate between the masculine definite article u and the feminine definite article a) is not correct in that it fails to determine the gender – masculine or feminine – of Yale Law School, the name of an English school. In order to provide the correct translation, it is necessary to know how to translate Yale Law School into Corsican, and thus to determine that school is translated by scola, which is feminine. Therefore the correct translation should have been: po à a Yale Law School prima di ….
This finally shows that a translator capable of translating with 100% performance must be able (i) to determine the language in which the text parts are written in another language and (ii) to translate those text parts into the target language. This highlights the skills necessary to successfully achieve the remaining 1% are: (i) the ability to determine the language of a subtext and (ii) the ability to translate a subtext from any language in the target language.

Presently, we can only conjecture that this ability to solve the remaining 1% requires artificial general intelligence (AGI ). Now providing concrete and detailed examples may help to confirm or disprove that hypothesis.

Posted in blog | Tagged , | 1 Comment