Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?
Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.
If we were to update the priorities for language pairs to be achieved, from the point of view of endangered languages, the result would be as follows:
Corsican language: French to Corsican (already done)
Sardinian Gallurese: Italian to Gallurese
Sardinian Sassarese: Italian to Sassarese
Sicilian: Italian to Sicilian: sicilian language is close to Corsican sartinesu or taravesu
Munegascu: French to Munegascu: munegascu language bears some similarities with Corsican language
Pairs such as French to Gallurese, French to Sassarese, English to Gallurese, English to Sassarese, English to Sicilian do not have priority, as they can be resolved using an intermediate pair. French to Gallurese is done with the French to Italian pair (e.g. with Deepl) and then with the Italian to Gallurese pair, etc.
Translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed translation ‘ti tengu caru/cara‘, whose (difficult) disambiguation must be done according to the context.
It is worth sketching a few ideas, in order to get some insight into this issue. First of all, let’s look at the problem synthetically. This underlines the problem inherent in the grammatical status of the sentence ‘je t’aime’ (I love you) in French or in English, as it is not known whether it is addressed to a male or a female person. If one were to assign a gender to this sentence, it would therefore be masculine or feminine, with an inherent ambiguity. Assigning in some way a gender – masculine or feminine – to a sentence may seem strange prima facie, but it could prove useful (to be confirmed) In this case, the gender associated with the sentence would be inherited from the pronoun ‘t’ (short for ‘te’) which remains unambiguated with the sentence ‘je t’aime’ (I love you, ti tengu caru/cara) alone.
Second, let’s look at the issue from an analytical perspective. For another way to solve the problem could be to assign a reference to the pronoun ‘te’ (you). The latter could be identified according to the context. This sounds more promising and more in line with the well-known problem of pronoun resolution.
Let us add further reflexions on the remaining 1% problem. As hinted at previously, the remaining 1% problem may only be solved by general AI (GAI). Let us sketch in a series of posts what features are required for general AI in this context. On feature of GAI would be the ability to solve the ‘taxonomy optimization problem’. Let’s focus on defining it (very roughly, to begin with). Let us consider a given language, defined with a certain number of words, and a corpus of sentences (or a set of rules to define licit sentences in this language). In this context, the ‘taxonomy optimization problem’ is the question of deciding what is the simplest taxonomy with its associated rules to resolve the type ambiguities existing in this language? This feature of GAI would be notably capable of defining the best taxonomy for resolving type ambiguities existing within this language. And it is possible that such a feature of GAI would revolutionize grammar and our present grammatical taxonomy.
Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different ways. In the sequence ‘mais nombre de poissons sont longs’ (but many fish are long), ‘number of’ is an indefinite determiner: it translates as bon parechji (many). On the other hand, in the sequence ‘mais le nombre de poissons est supérieur à dix’ (but the number of fish is greater than ten), ‘nombre de’ is a common name followed by the preposition ‘de’: it is translated by numaru di (number of). Statistical MT does usually better than human-like (rule-based) MT at polymorphic disambiguation (I did a test with both sentences with Deepl and Google translate, and both of them successfully solve the relevant polymorphic disambiguation), but it turns out that human-like (rule-based) MT is also capable of handling that.
Let us comment on the remaining errors encountered in the above open test:
French ‘carrière’ remains undisambiguated: either carriera (career) or cava (quarry): two occurrences
‘de’: French ‘de’ is perhaps the most difficult word to translate into another language, due to its general polymorphism
‘national-socialiste’: missing vocabulary
l’ within ” l’empeche “: pronoun error
it should be pointed out that ‘Etats-Unis’ remains untranslated due to the fact that it is erroneously written, with a beginning E instead of É
The result is 1 – (5/169) = 97.04%. To be noticed: ambiguous French word ‘partie’ (‘durant la première partie’, during the first part) is correctly disambiguated into parti (part), instead of partita (game, match).
It seems that an average result of 95% is currently being consolidated, and that an average result of 96% is a target that should be achievable within a year.
The analysis of the Wikipedia article of the day in French is interesting, in the sense that it sheds light on the skills that will be necessary for a machine translation system to achieve a 100% accurate translation. The error that appears here is characteristic and must probably be placed in the missing 1% to achieve 100% accuracy in the translation (the problem of the remaining 1%). The phrase ‘Her father studied at the University of Oregon and then at Yale Law School‘ has a definite article with elision: l’. The translation given (u/a, i.e. indeterminate between the masculine definite article u and the feminine definite article a) is not correct in that it fails to determine the gender – masculine or feminine – of Yale Law School, the name of an English school. In order to provide the correct translation, it is necessary to know how to translate Yale Law School into Corsican, and thus to determine that school is translated by scola, which is feminine. Therefore the correct translation should have been: po à a Yale Law School prima di …. This finally shows that a translator capable of translating with 100% performance must be able (i) to determine the language in which the text parts are written in another language and (ii) to translate those text parts into the target language. This highlights the skills necessary to successfully achieve the remaining 1% are: (i) the ability to determine the language of a subtext and (ii) the ability to translate a subtext from any language in the target language.
Presently, we can only conjecture that this ability to solve the remaining 1% requires artificial general intelligence (AGI ). Now providing concrete and detailed examples may help to confirm or disprove that hypothesis.
A jeweler examines an emerald. “Aha,” he says, “another green emerald. In all my years in this business, I must have seen thousands of emeralds, and every one has been green.” We think the jeweler reasonable to hypothesize that all emeralds are green. Next door is another jeweler having equally comprehensive experience with emeralds. He speaks only the Choctaw Indian language. Color distinctions are not as universal as might be thought. The Choctaw Indians made no distinction between green and blue—the same words applied to both. The Choctaws did make a linguistic distinction between okchamali, a vivid green or blue, and okchakko, a pale green or blue. The Choctaw-speaking jeweler says: All emeralds are okchamali. He maintains that all his years in the jewelry business confirm this hypothesis. (William Poundstone, Labyrinths of reason)