Now we face false positives again: French proper noun ‘Détroit’ is translated erroneously into Strittonu when it shouls have been left untradslated, being a proper noun. The ambiguity of ‘Détroit’ lies in the fact that it can be translated either into:
- Détroit, the city
- Strittonu, the Corsican word strittonu/strittone being the corresponding word for French noun ‘détroit’ (strait, i.e. the strait of Messina).
This raises the general issue of the proper disambiguation of proper nouns.
Evaluation of machine translation is usually done via external tools (to cite some instances: ARPA, BLEU, METEOR, LEPOR, …). But let us investigate the idea of self-evaluation. For it seems that the software itself is capable of having an accurate idea of its possible errors.
In the above example, human evaluation yields a score of 1 – 5/88 = 94.31%. Contrast with self-evaluation which sums its possible errors: unknown words and disambiguation errors, thus entailing a self-evaluation of 92,05%, due to 7 hypothesized errors. In this case, self-evaluation computes the maximum error rate. But even here, there are some false positives: ‘apellation’ is left untranslated, being unrecognized. In effect, the correct spelling is ‘appellation’. To sum up: the software identifies an unknown word (and lefts it untranslated) and counts it as a possible error.
Let us sketch what could be the pros and cons of MT self-evaluation. To begin with, the pros:
- it could provide a detailed taxonomy of possible errors: unknown words, unresolved grammatical disambiguation, unresolved semantical disambiguation, …
- it could identify precisely the suspected errors
- evaluation would be very fast and uncostly
- self-evaluation would work with whatever text or corpus
- self-evaluation could pave the way to further self-improvement and self-correction of errors
- its reliability could be good
And the cons:
- MT may be unaware of some types of errors, i.e. errors related to expressions and locutions
- it would sometimes engender false positives and thus, an issue would be to identify those false positives
- MT would be unaware of erroneous disambiguations
In Corsican language, French word ‘femme’ can be translated, depending on the context
- either into donna (woman)
- or into moglia (wife)
The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).
French samples are from the French corpora of the University of Leipzig.
After improper anaphora resolution
Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).
The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.
After appropriate anaphora resolution
Translating the following sentence: ‘ce fait est unique’ is not as easy as it could seem at first glance. In effect, it is made up of four consecutive ambiguous words:
- ‘ce’: ‘ssu (demonstrative pronoun, this) or ciò (it, relative pronoun)
- ‘fait’: fattu (masculine singular noun, fact), fattu (past participe, done) or faci (does, third person singular of the verb to do at the present tense)
- ‘est’: estu (masculine singular noun, east) or hè (is, third person singular of the verb to be at the present tense)
- ‘unique’: unicu (masculine singular adjective, unique in English) or unica (feminine singular adjective, unique in English)