Will progress in MT require some breakthrough in grammar conceptualization? Rule-based MT requires a detailed typology of grammatical categories. But it seems that the present state of grammar conceptualization is insufficient for MT purposes. Some enhanced typology is in order. In particular the common typology relating to:
- partitive article, i.e. the disambiguation of French ‘de’
is in need of a more accurate conceptualization.
Si tratta quì di uni pochi di locuzioni.
Now handling some locutions.
What is it to make a rule-based translation software for a given language pair?
It amounts to making part of Brain Emulation, dedicated to translating one language into another i.e. emulating the brain of a bilingual individual. Arguably, ‘human cognition emulation’ is best suited here than ‘human brain emulation’, since this kind of emulation does not bear on neurons or synapses.
It shoud be noted that this applies specifically to rule-based translation. But this casts light on the fact that the present kind of translation should be better termed ‘human translation emulation’, since it aims at emulating human reasoning in the process of translating one language into another, by notably taking into account:
A ci pruvemu quì incù i sinonimi:
- ‘amélioration’ = migliuramentu/migliuranza
- ‘lendemain’ = lindumane/ghjornu dopu
- ‘survenu’ = successu/accadutu
- ‘don’ = daziu/donu
Let us now handle synonyms:
- ‘amélioration’ = migliuramentu/migliuranza (improvement)
- ‘lendemain’ = lindumane/ghjornu dopu (next day)
- ‘survenu’ = successu/accadutu (occurred)
- ‘don’ = daziu/donu (gift)
Posted in blog
Tagged synonyms, synonymy
A ci pruvemu incù u talianu: quì si facini i prima passi par traducia da u francesu à u talianu.
First steps from French to Italian.
Let us consider here the disambiguation of ‘nombre de’ which can be according to the cases:
- a singular masculine noun followed by a preposition: in this case, ‘nombre de’ translates to numaru di (number of)
- an indefinite pronoun: in this case, French ‘nombre de’ translates to Corsican into bon parechji (many, a great many)
Si tratta quì di a disambiguazioni di ‘nombre de’ chì pò essa siont’è i casi:
- un nomu maschili singulari suvitatu da una pripusizioni: in ‘ssu casu, ‘nombre de’ si traduci pà numaru di
- un prunomu indefinitu: in ‘ssu casu, ‘nombre de’ pò essa traduttu in corsu da bon parechji
Posted in blog
Now performing large-scale testing with self-evaluation on full wikipedia articles:
- Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
- Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
- Everest: Self-evaluation: 1 – 606/10530 = 94,25%
- Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
- démocratie: Self-evaluation: 1 – 515/11430 = 95,49%
Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:
Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%
French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):
- unknown vocabulary: 40% (easy)
- basic disambiguation: 25% (easy or medium difficulty)
- false positives: 5% (medium difficulty or hard). This type of error is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
- inadequate locution: 10% (medium difficulty or hard)
- anaphora resolution related to complex sentence’s structure: 5% (hard)
- semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
- erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
- erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
- specific grammatical case: 2% (hard)
- anaphora resolution associated with gender or number mismatch: 1% (hard)
- unknown, unclassified: 6% (hard)