Now performing large-scale testing with self-evaluation on full wikipedia articles:
- Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
- Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
- Everest: Self-evaluation: 1 – 606/10530 = 94,25%
- Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
- démocratie: Self-evaluation: 1 – 515/11430 = 95,49%
Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:
Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%
French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):
- unknown vocabulary: 40% (easy)
- basic disambiguation: 25% (easy or medium difficulty)
- false positives: 5% (medium difficulty or hard). This type of error is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
- inadequate locution: 10% (medium difficulty or hard)
- anaphora resolution related to complex sentence’s structure: 5% (hard)
- semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
- erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
- erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
- specific grammatical case: 2% (hard)
- anaphora resolution associated with gender or number mismatch: 1% (hard)
- unknown, unclassified: 6% (hard)
Here is an overview of the average performance at open test. But we shall not extrapolate too fast… Presumably the most difficult will be from 98% to 99%, and even worse from 99% to 100% (if attainable…). It could even be extremely difficult to pass from 99.00% to 99.10%… And it may even take two years to pass from 97% to 98%. Who knows?
Now an open question that is lurking is the following: how hard will it be to pass from 98% to near 100%? To simplify matters and to put it in a sharper form, we can formulate it as follows: how hard will it be to pass from 99% to near 100%? Call it the last percent problem. It seems such an ability would at least require multi-language translation capability. But it could even require full-featured AI (not limited to machine translation). To put it more clearly: will near human quality translation require full-featured AI?
Now testing large-scale self-evaluation. In the present sample, self-evaluation relates to a 7693 words (45437 characters) text from the French wikipedia article on Constance II (Constantius II): 414 errors found.
The present test illustrates well the benefits of self-evaluation: it runs fast, and gives a rough estimation of MT accuracy (± 2%).
Now handling some kind of false positives related to proper nouns translation. As this type of error is somewhat widespread, it could result in a 0.2% increase in overall accuracy.
Of interest in the present case:
- recall that ‘détroit’ is French name for strittonu (straight, i.e. the straight of Gibraltar)
- ‘Tours’ (the French city of) is also left untranslated, also being ambiguous with torri (towers) or ghjiri (turns)
- 12th Street riot, Michigan are left untranslated
- self-evaluation finds erroneously 2 vocabulary errors : riot and ‘th’ in 12th