- Interesting case of first name disambiguation
- Superintelligent machine translation
- Writing differences between Corsican and Gallurese
- What are the conditions for a given endangered language to be a candidate for rule-based machine translation?
- Quandu da la forza à la raghjoni cuntrasta Tandu vinci la forza è la raghjoni ùn basta
- How rule-based and statistical machine translation can help each other
- Why rule-based translation is (presently) best suited to endangered languages
- Rough typology of remaining errors
- Enhancing French to Italian translation
- Very first draft on French to Italian
- Improvement in grammatical structures: another 100% hit
- Percentage of ambiguous words in French sentence (from French to Corsican translation perspective)
- Disambiguation of two consecutive ambiguous words: ‘plusieurs mois’
- Disambiguation of ‘vie’
- A virsioni 1.1 hè dispunibuli
Italian source text
Tagsadjective accordance cismuntincu conjugation Corse Corsica Corsican 'sartinesu' Corsican language corsu dependency parser dependency parsing disambiguation false positive Feigenbaum hit Feigenbaum test francese-corsu francesu-corsu français-corse French into Corsican French to Corsican French to English gaddhuresu gallurese language Italian Italian language langue corse machine translation numbers grammatical type past participe accordance preposition reference reference language shift rule-based machine translation sartinesu self-reference semantic disambiguation statistical machine translation taravesu traduction traduction automatique traduttore traduttori translation corpora translation corpus translator word-sense dismbiguation
A jeweler examines an emerald. "Aha," he says, "another green emerald. In all my years in this business, I must have seen thousands of emeralds, and every one has been green." We think the jeweler reasonable to hypothesize that all emeralds are green. Next door is another jeweler having equally comprehensive experience with emeralds. He speaks only the Choctaw Indian language. Color distinctions are not as universal as might be thought. The Choctaw Indians made no distinction between green and blue—the same words applied to both. The Choctaws did make a linguistic distinction between okchamali, a vivid green or blue, and okchakko, a pale green or blue. The Choctaw-speaking jeweler says: All emeralds are okchamali. He maintains that all his years in the jewelry business confirm this hypothesis. (William Poundstone, Labyrinths of reason)
The Corsican language is currently considered by Unesco as a "definitely endangered language". This site's aim is to help reviving the Corsican language by providing translation into Corsican. It translates French and Italian into one of the three main Corsican variants: 'cismuntincu', 'sartinesu' or 'taravesu'.
Most illustrations are from Wiki Commons
Author Archives: pilinu
Here is an interesting case of first name disambiguation for machine translation. Consider the following first name ‘Camille’. It can apply to both genders. In Corsican (taravese or sartinese variants) it translates either into Cameddu (masculine) or Camedda (feminine). In … Continue reading
Let us consider superintelligence related to machine translation. To fix ideas, we can propose a rough definition: machine with the ability to translate with 99% or above accuracy from one of the 8000 languages to another. It seems relevant here … Continue reading
Here are some writing differences between Corsican and Sardinian gallurese, that result from historical writing habits. These writing differences prevail, even when the words are the same: ghj is replaced by gghj: acciaghju (corsu), acciagghju (gallurese) , steel chj is … Continue reading
What are the conditions for a given endangered language to be a candidate for rule-based machine translation?
What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for: – … Continue reading
Quandu da la forza à la raghjoni cuntrasta Tandu vinci la forza è la raghjoni ùn basta. This is a rare Corsican proverb. In French, litterally: “Lorsque la force et la raison s’opposent, alors la force gagne car la raison … Continue reading
Here are a few suggestions on how rule-based and statistical machine translation can help each other: (This is a follow-up to the previous post) to begin with, rule-based and statistical machine translation are often contrasted and compared: it would be … Continue reading
Here are some arguments in favor of the choice of rule-based translation concerning machine translation of endangered languages (it relates to the philosophy of language policy): there does not exist at present time a reliable corpus between the given endangered … Continue reading
French to Corsican: performing on French wikipedia sample test currently amounts to 93% on average. Below is a rough typology of remaining errors (presumably an average of 95% performance should be attainable on the basis of correction of ‘easy’ tagged errors): … Continue reading
Some improvements made to French to Italian translation: fixed several contractors (della, dello, …) the nice thing is that semantic disambiguation is working: ‘échecs’ = fallimenti/scacchi (failures/chess) and translates properly into scacchi
Now testing French to Italian translation: it is the very first draft. A rough 80%. A lot of things to fix.