Si tratta di une poche di locuzione – now handling some locutions

Cattura di screnu di Okchakko traduttori 1.0 (Android)

Si tratta quì di uni pochi di locuzioni.

Now handling some locutions.

Posted in blog | Tagged , | Leave a comment

Brain Emulation

What is it to make a rule-based translation software for a given language pair?
It amounts to making part of Brain Emulation, dedicated to translating one language into another i.e. emulating the brain of a bilingual individual. Arguably, ‘human cognition emulation’ is best suited here than ‘human brain emulation’, since this kind of emulation does not bear on neurons or synapses.

It shoud be noted that this applies specifically to rule-based translation. But this casts light on the fact that the present kind of translation should be better termed ‘human translation emulation’, since it aims at emulating human reasoning in the process of translating one language into another, by notably taking into account:

(Updated may 2018, Illustration from wiki commons: Roux, Wilhelm, 1850-1924)

Posted in blog | Tagged , | Leave a comment

Trattendu si di sinonimi – Now handling synonyms

A ci pruvemu quì incù i sinonimi:

  • ‘amélioration’ = migliuramentu/migliuranza
  • ‘lendemain’ = lindumane/ghjornu dopu
  • ‘survenu’ = successu/accadutu
  • ‘don’ = daziu/donu

 

Let us now handle synonyms:

  • ‘amélioration’ = migliuramentu/migliuranza (improvement)
  • ‘lendemain’ = lindumane/ghjornu dopu (next day)
  • ‘survenu’ = successu/accadutu (occurred)
  • ‘don’ = daziu/donu (gift)

 

 

Posted in blog | Tagged , | 1 Comment

A ci pruvemu incù u talianu – First steps from French to Italian

A ci pruvemu incù u talianu: quì si facini i prima passi par traducia da u francesu à u talianu.


First steps from French to Italian.

Posted in blog | Tagged , , | Leave a comment

Disambiguating ‘nombre de’

Let us consider here the disambiguation of ‘nombre de’ which can be according to the cases:

  • a singular masculine noun followed by a preposition: in this case, ‘nombre de’ translates to numaru di (number of)
  • an indefinite pronoun: in this case, French ‘nombre de’ translates to Corsican into bon parechji (many, a great many)

Si tratta quì di a disambiguazioni di ‘nombre de’ chì pò essa siont’è i casi:

  • un nomu maschili singulari suvitatu da una pripusizioni: in ‘ssu casu, ‘nombre de’ si traduci pà numaru di
  • un prunomu indefinitu: in ‘ssu casu, ‘nombre de’ pò essa traduttu in corsu da bon parechji
Posted in blog | Tagged | Leave a comment

Large-scale testing and self-evaluation

Now performing large-scale testing with self-evaluation on full wikipedia articles:

  • Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
  • Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
  • Everest: Self-evaluation: 1 – 606/10530 = 94,25%
  • Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
  • démocratie: Self-evaluation: 1 – 515/11430 = 95,49%

Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:

Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%

Posted in blog | Tagged | Leave a comment

Rough typology of remaining errors (updated march 2018)

French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):

 

  • unknown vocabulary: 40% (easy)
  • basic disambiguation: 25%  (easy or medium difficulty)
  • false positives: 5% (medium difficulty or hard). This type of error  is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
  • inadequate locution: 10% (medium difficulty or hard)
  • anaphora resolution related to complex sentence’s structure: 5% (hard)
  • semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
  • erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
  • erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
  • specific grammatical case: 2% (hard)
  • anaphora resolution associated with gender or number mismatch: 1% (hard)
  • unknown, unclassified: 6% (hard)
Posted in blog | Tagged , | Leave a comment

Diachronic overview of performance at open test

 

 

 

 

 

 

 

 

Here is an overview of the average performance at open test. But we shall not extrapolate too fast… Presumably the most difficult will be from 98% to 99%, and even worse from 99% to 100% (if attainable…). It could even be extremely difficult to pass from 99.00% to 99.10%… And it may even take two years to pass from 97% to 98%. Who knows?

Now an open question that is lurking is the following: how hard will it be to pass from 98% to near 100%? To simplify matters and to put it in a sharper form, we can formulate it as follows: how hard will it be to pass from 99% to near 100%? Call it the last percent problem. It seems such an ability would at least require multi-language translation capability. But it could even require full-featured AI (not limited to machine translation). To put it more clearly: will near human quality translation require full-featured AI?

 

Posted in blog | Tagged , , | Leave a comment