French to Corsican: performing on French wikipedia sample test currently amounts to 93% on average. Below is a rough typology of remaining errors (presumably an average of 95% performance should be attainable on the basis of correction of ‘easy’ tagged errors):
- unknown vocabulary: 50% (easy)
- basic disambiguation: 15% (easy)
- erroneous accord (relates to (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 5% (medium difficulty )
- inadequate locution: 10% (medium difficulty or hard)
- false positives: 5% (medium difficulty or hard)
- semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
- specific grammatical case: 2% (hard)
- word reference error: 2% (hard)
- unknown, unclassified: 6% (hard)
Some improvements made to French to Italian translation:
- fixed several contractors (della, dello, …)
- the nice thing is that semantic disambiguation is working: ‘échecs’ = fallimenti/scacchi (failures/chess) and translates properly into scacchi
Now testing French to Italian translation: it is the very first draft. A rough 80%. A lot of things to fix.
Progress on grammatical structures: some improvements to be included in future 1.2 version yield another Feigenbaum hit: 100%. In the present case, the Corsican language variety is taravese.
What is the average percentage of ambiguous words in a French sentence (from a French to Corsican translation perspective). In the above example, this percentage amounts to 20/99 words = approximately 20%. Not all semantic ambiguities are taken into account here, so the real average should amount at least to 25%.
- le = u/lu: definite article or pronoun (the/it)
- est = livanti/hè: masculine noun or verb (east/is)
- culminant = culminanti/culminendu: adjective or gerund
- émerge = emerghju/emerghji: first person or third person verb
- commence = principiu/principia: first person or third person verb (begin/begins)
- cesse = cessu/cessa: first person or third person verb (cease/ceases)
- volcanique = vulcanicu/vulcanica: adjective, masculine of feminine (volcanic, unambiguous from a French to English translation perspective)
Testing improved disambiguation engine. This is a special case of disambiguation of two consecutive ambiguous words. French ‘au terme de plusieurs mois’ translates into à u capu di parechji mesa (at the end of several months) in Corsican (taravese variant). In this case, ‘plusieurs’ and ‘mois’ are ambiguous:
- ‘plusieurs’ (several) as an indefinite plural pronoun can be either masculine of feminine.
- ‘mois’ as a noun can be either singular (month, mesi) or plural (months, mesa: plural with a final –a is reminiscent of latine neutral)
There is only one error in the above translation: da latu should be replaced by da cantu.
We face here a special case of disambiguation: ‘un général byzantin du vie siècle’ (a Byzantine general of the sixth century) should translate: un generali bizantinu di u 6esimu seculu. French ‘vie’ is ambiguous between vita and 6esimu or VIesimu (life/sixth). In effect, ‘vi’ is sometimes used for the roman numeral ‘VI’. In this case, ‘VIe’ is unambiguous.
This also rises the interesting and more general issue: are ambiguities a weakness for a language? Is it better for a language to have few ambiguities?
Okchakko Traduttori: a virsioni 1.1 hè dispunibuli. Ci sò i nuvità:
– traduci da u francesu à i trè varietà maestri di a lingua corsa: cismuntincu, sartinesu, taravesu
– migliuramentu riguardu à u schidariu d’aiutu
– migliuramentu riguardu à l’elisioni
– vucabulariu allargatu
Light version 1.1 is available. New features:
- translates from French to one of the three main variants of Corsican language: cismuntincu, sartinesu, taravesu
- some improvements made to the help file
- improvements on elision
- additional vocabulary