Here is an overview of the average performance at open test. But we shall not extrapolate too fast… Presumably the most difficult will be from 98% to 99%, and even worse from 99% to 100% (if attainable…). It could even be extremely difficult to pass from 99.00% to 99.10%… And it may even take two years to pass from 97% to 98%. Who knows?
An open question is: how hard will it be to pass from 98% to near 100%? It seems such an ability would require multi-language translation capability, and it could even require full-featured AI (not limited to machine translation). To put it more clearly: will near human quality translation require full-feature AI?
Now testing large-scale self-evaluation. In the present sample, self-evaluation relates to a 7693 words (45437 characters) text from the French wikipedia article on Constance II (Constantius II): 414 errors found.
The present test illustrates well the benefits of self-evaluation: it runs fast, and gives a rough estimation of MT accuracy (± 2%).
Now handling some kind of false positives related to proper nouns translation. As this type of error is somewhat widespread, it could result in a 0.2% increase in overall accuracy.
Of interest in the present case:
- recall that ‘détroit’ is French name for strittonu (straight, i.e. the straight of Gibraltar)
- ‘Tours’ (the French city of) is also left untranslated, also being ambiguous with torri (towers) or ghjiri (turns)
- 12th Street riot, Michigan are left untranslated
- self-evaluation finds erroneously 2 vocabulary errors : riot and ‘th’ in 12th
Testing self-evaluation accuracy: in the present case, it yields a 100% performance. However, there is one error ‘par des explosifs’ should read da splusivi or even da i splusivi (by explosives): a problem of partitive article. Arguably, there is a second grammatical error to which self-evaluation is blind: ‘sont ensuite détruits’ should read sò distrutti dopu (are then destroyed): the problem lies in the fact that prepostion dopu should be placed more adequately before the verb. In short : human evalution yields 98,14% performance in the present case. (by the way, it seems average performance on MT open test is currently nearing 94%.)
A hard case of disambiguation is successfully handled here (amid some vocabulary errors) : ‘les premiers parachutistes des 82e et 101e divisions…’ is properly translated into … di i 82a è 101a divisioni… It is a hard case, since ultima facie ’82e’ and ‘101e’ are both feminine singular whereas ‘divisions’ is feminine plural (since there are two divisions). But prima facie, ’82e’ and ‘101e’ in French (in full ’82ème’ and ‘101ème’, i.e. 82th and 101th) are both ambiguous between masculine singular and feminine singular.
Also of interest:
- self-evaluation in this case is rather reliable, as it provides roughly the same result than human evaluation
- there is a false positive: ‘CG-4A’ being translated erroneously into CG-4Hà: this raises the issue of the correct translation of proper nouns that are composed of letters and numbers