Tagger improvement: fixed this issue. French ‘l’Empire allemand’ now translates properly into l’Imperu alimanu (the German Empire). French word ‘fin’ is now identified as a preposition when followed by a year number.
The above excerpt is translated into the ‘sartinesu’ variant of Corsican language.
This issue relates to the more general problem of the grammatical status of numbers, a problem to which we shall return later.
There is one informative error here: ‘a agité l’Empire allemand fin 1913’ (agitated the German Empire at the end of 1913) should translate into chì hà agitatu l’Imperu alimanu à a fini di u 1913. The translation error (l’Imperu alimana instead of l’Imperu alimanu) is due to the fact that the adjective alimana (feminine singular, German) accords erroneously with the feminine word fini (end). French ‘fin’ is short here for ‘à la fin de’. This casts light on the fact that French ‘fin’ is considered erroneously a feminine noun, whereas it is in reality a preposition which means ‘à la fin de’ (at the end of). The same goes when ‘fin’ is followed by a word denoting a month: ‘fin august’, ‘fin janvier’ (at the end of august, at the end of january). This applies both to Corsican and English.
French ‘il existe 29 parcs nationaux’ (there are 29 national parks) translates into Corsican: esistenu 29 parchi naziunali. When the verb ‘to exist’ is used and its object is plural, a plural form of the verb is required in Corsican language. The same goes for English, although the case is somewhat different, the translation switching here to the verb ‘to be’.
One error. Scoring 1 – (1/105) = 99.04%.
Is it a successful Feigenbaum hit? Certainly, since this kind of error is not a gross one. Undoubtedly, it can be considered as a type of error a human could do.
‘au stade de Wembley’ (at the Wembley Stadium) should translate in u stadiu di Wembley.
We face the issue of the translation of preposition ‘à’ since ‘au’ is short for ‘à le’ (to the), in particular when ‘à’ is followed by a noun phrase denoting a location. This occurs in the disambiguation of French ‘à’ which can can either translate into à (to) or into in (in).
Now scoring 1 – 2/128 = 98.43%. There are only two related errors, of a special case of adjective accordance: ‘aux xxie et XXe siècles’ (in the 21st and 20th centuries) should translate into: à i XIXu è XXu seculi. There are 3 ambiguous words here:
- ‘aux’ i.e. ‘à les’ (in the): à i (masculine plural)/à e (feminine plural)
- ‘xxie’ i.e. ‘vingt-et-unième’ (21st): XIXu (masculine singular)/XIXa (feminine singular)
- ‘xxe’ i.e. ‘vingtième’ (20th): XXu (masculine singular)/XXa (feminine singular)
Proper accordance should be performed as follows:
- ‘aux’ : à i (masculine plural): depends on ‘siècles’ (centuries), masculine plural
- ‘xxie’ i.e. vingt-et-unième (21st): XIXu (masculine singular)
- ‘xxe’ i.e. vingtième (20th): XXu (masculine singular)
Of the same type are:
- ‘les langues italienne et française’: e lingue taliana è francesa : the Italian and French languages (English is ambiguous in this case, since ‘les langues italiennes et françaises’ translate the same, although the meaning is different, referring explicitly to the several varieties of Italian anf French languages. In French, the ambiguity only concerns oral text, since the written sentence is unambiguous. In Corsican language, both written and oral sentences are unambiguous.)
- ‘les codes pénal et civil’: i codici penale è civile : the penal and civil codes
Now should it be considered an instance of a successful Feigenbaum test? Arguably, yes (although this is debatable). These two errors can not be considered as gross errors, from a Feigenbaum test perspective. They can be considered as some errors a human could do.
But caution: at present time, this is only one exceptional case of successful instance. Call it Feigenbaum hit. What we are intested in is regular successful Feigenbaum test. For the moment the software is not capable of that. New target: 99% and/or more frequent successful Feigenbaum hits.
What is it to make a rule-based translation software for a given language pair?
It amounts to making part of Brain Emulation, dedicated to translating one language into another i.e. emulating the brain of a bilingual individual. Arguably, ‘human cognition emulation’ is best suited here than ‘human brain emulation’, since this kind of emulation does not bear on neurons or synapses.
Now scoring 1 – 2/129 = 98.44%.
- The issue of past participe’s accordance again: ‘une session du parlement tenue à Nuremberg’ (a session of the Parliament held in Nuremberg) should translate into una sessione di u parlamentu tenuta in Nuremberg. Past participe tenuta should accord with sessione (feminine, session) and not with parlamentu (masculine, Parliament). This could need dependency parsing, but it could be insufficient. Perhaps (harder) semantic disambiguation is required in this case.
- One false positive: ‘des’, being a Deutsch word, should remain untranslated.
In the present case, it should read, custruitu à u seculu XII (built in the 12th century). The error relates to the disambiguation of French ‘construit’. It can translate into:
- custruitu (built): past participe, masculine, singular
- custruisce (builds): present simple, third person
MT should (i) find the proper reference of ‘construit’, i.e. ‘clocher’ (church tower), but above all (ii) whether ‘construit’ is a past participe or a present simple. Some kind of dependency parser is in order…
Scoring 1 – 2/127 = 98.42%. Of interest:
- ‘de 839 à sa mort’ (from 839 to his death) should read: da u 839 à a so morte. French ‘de’ translates either into di or into da in Corsican language (to simplify matters, since in certain cases, being a partitive article, it translates into nothing).
- now we face again the multi-ambiguous French ‘fils’, which can translate into: i) figliolu, masculine, singular (son) ii) figlioli, masculine, plural (sons) iii) fili, masculine, plural (wire/wires). In the present case, ‘Fils du roi…’ should translate Figliolu di u rè… (Son of King…).
To notice: five consecutive 100% sentences.
With regard to the Feigenbaum test: failed again. Arguably, the first error is of an acceptable kind, in this context. But the ‘fils’ error is a gross one, that a human would not do…
Can translation help self-teaching and endangered language? It seems yes, it the translation is accurate. Let us check with the verb parlà (to speak). In this case, the translation is 100% accurate, so it can help (but we need to check other verb categories and other tenses). Other verbs of the same group are verbs that end with -à: manghjà (to eat), saltà (to jump), cantà (to sing), etc.
To begin with: conjugations, present simple:
- je parle (I speak), tu parles (you speak), il/elle parle (he/she speaks),
nous parlons (we speak), vous parlez (you speak), ils/elles parlent (they speak)
- je parlais (I was speaking), tu parlais (you were speaking), il/elle parlait (he/she was speaking),
nous parlions (we were speaking), vous parliez (you were speaking), ils/elles parlaient (they were speaking)
- je parlerai (I will speak), tu parleras (you will speak), il/elle parlera (he/she will speak), nous parlerons (we will speak), vous parlerez (you will speak), ils/elles parleront (they will speak).
- French ‘parle’ is ambiguous since it can translate into parlu (I speak) or parla (he/she speaks).
- French ‘parlais’ is ambiguous since it can translate into parlavu (I was speaking) or parlavi (you were speaking).