Large-scale testing and self-evaluation

Now performing large-scale testing with self-evaluation on full wikipedia articles:

  • Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
  • Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
  • Everest: Self-evaluation: 1 – 606/10530 = 94,25%
  • Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
  • démocratie: Self-evaluation: 1 – 515/11430 = 95,49%

Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:

Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%

Posted in blog | Tagged | Leave a comment

Rough typology of remaining errors (updated march 2018)

French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):


  • unknown vocabulary: 40% (easy)
  • basic disambiguation: 25%  (easy or medium difficulty)
  • false positives: 5% (medium difficulty or hard). This type of error  is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
  • inadequate locution: 10% (medium difficulty or hard)
  • anaphora resolution related to complex sentence’s structure: 5% (hard)
  • semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
  • erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
  • erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
  • specific grammatical case: 2% (hard)
  • anaphora resolution associated with gender or number mismatch: 1% (hard)
  • unknown, unclassified: 6% (hard)
Posted in blog | Tagged , | Leave a comment

Diachronic overview of performance at open test









Here is an overview of the average performance at open test. But we shall not extrapolate too fast… Presumably the most difficult will be from 98% to 99%, and even worse from 99% to 100% (if attainable…). It could even be extremely difficult to pass from 99.00% to 99.10%… And it may even take two years to pass from 97% to 98%. Who knows?

Now an open question that is lurking is the following: how hard will it be to pass from 98% to near 100%? To simplify matters and to put it in a sharper form, we can formulate it as follows: how hard will it be to pass from 99% to near 100%? Call it the last percent problem. It seems such an ability would at least require multi-language translation capability. But it could even require full-featured AI (not limited to machine translation). To put it more clearly: will near human quality translation require full-featured AI?


Posted in blog | Tagged , , | Leave a comment

Testing large-scale self-evaluation

Now testing large-scale self-evaluation. In the present sample, self-evaluation relates to a 7693 words (45437 characters) text from the French wikipedia article on Constance II (Constantius II): 414 errors found.

The present test illustrates well the benefits of self-evaluation: it runs fast, and gives a rough estimation of MT accuracy (± 2%).

Posted in blog | Tagged , | Leave a comment

Proper nouns: handling some false positives

Now handling some kind of false positives related to proper nouns translation. As this type of error is somewhat widespread, it could result in a 0.2% increase in overall accuracy.

Of interest in the present case:

  • recall that ‘détroit’ is French name for strittonu (straight, i.e. the straight of Gibraltar)
  • ‘Tours’ (the French city of) is also left untranslated, also being ambiguous with torri (towers) or ghjiri (turns)
  • 12th Street riot, Michigan are left untranslated
  • self-evaluation finds erroneously 2 vocabulary errors : riot and ‘th’ in 12th
Posted in blog | Tagged , , | Leave a comment