Tag Archives: machine translation

The status of adjective modifiers

What is the status of adjective modifiers (tant, tout juste, un rien, un tantinet, très, extrêmement, … = so much, just a little, a little, a little, very, extremely, …) in the present grammatical typology? Adjectives are defined as noun … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

The two-language matching problem

Here is a problem for a human intelligence (or an AGI): we have a dictionary (with words, lemmas and grammatical types) in a language A and a second dictionary in a language B. If we have an extensive corpus of … Continue reading

Posted in blog | Tagged , , , | Leave a comment

Why it’s worth it to engage in rule-based translation

Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) … Continue reading

Posted in blog | Tagged , , , , , , , , , , , , , , | Leave a comment

Reflections on grammatical typologies

It is useful to point out the differences that may exist between different grammatical typologies. The classical grammatical taxonomy is essentially aimed at teaching and comprehension. It therefore has a pedagogical purpose. On the other hand, the taxonomy that is … Continue reading

Posted in blog | Tagged , , | Leave a comment

Analyzing relative pronouns

What is the status of ‘relative pronouns’ of classical grammar within the present conceptual framework? Traditionally, a distinction is made between simple relative pronouns (qui, que, dont, où ; who, what, whose, where) and compound relative pronouns (à qui, pour … Continue reading

Posted in blog | Tagged , , , , | Leave a comment

Powering MT with two-sided grammar: the case of ‘près de’

‘près de’ (near) is considered to be a prepositive locution. From the viewpoint of two-sided grammar, it is (synthetically) a preposition, made up (analytically) of an adverb (‘près’) followed by the preposition ‘de’. In Corsican language, this is translated as … Continue reading

Posted in blog | Tagged , , | Leave a comment

Expanding on noun modulators

Let’s take a closer look at noun modulators, especially common noun modulators. We have seen that adjectives could be considered, in the present conceptual framework, as noun modulators. In this context, the question arises, are there other forms of noun … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

New: Part-of-speech tagger for French language API

I have just published the POS-tagger for French language API, on RapidAPI. The use of the API is free for 1000 requests / month. No training necessary, it works immediately.

Posted in blog | Tagged , , , , , , , , , , , , , | Leave a comment

Further reflexions on the status of “I love you” in Corsican language

Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti … Continue reading

Posted in blog | Tagged , , , , , , , , | Comments Off on Further reflexions on the status of “I love you” in Corsican language

More on polymorphic disambiguation…

Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different … Continue reading

Posted in blog | Tagged , , , | Leave a comment

Autonomous MT system

Let us speculate about what could be an autonomous MT system. In the present state of MT we provide rules and dictionary to the software (rules-based translation) or we feed it with a corpus regarding a given pair of languages … Continue reading

Posted in blog | Tagged , , | 1 Comment

Word sense disambiguation: a hard case

Let us consider a hard case for word sense disambiguation, in the context of French to Corsican MT. But the same goes for French to English MT. It relates to French words such as: ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

New insight on the issue of pair reversal (updated)

The issue of pair reversal: it goes as follows: Suppose your have a given translation pair A>B that translates language A into language B, how hard is it to build the reverse pair B>A? Now the current instance of this … Continue reading

Posted in blog | Tagged , , | Comments Off on New insight on the issue of pair reversal (updated)

How to translate ‘Cette phrase est en français’ ? (This sentence is in French) – updated

Let us consider the following French sentence: Le comté de Kronoberg est un comté suédois dont le nom signifie en français ‘Couronne de montagne’. It translates into Corsican: A cuntea di Kronoberg hè una cuntea svedese chì u so nome significheghja in … Continue reading

Posted in blog | Tagged , , , | Comments Off on How to translate ‘Cette phrase est en français’ ? (This sentence is in French) – updated

What is required from Artificial General Intelligence with regard to Machine Translation?

We will be interested in a series of posts to try to define what is required of an AGI (Artificial General Intelligence) in order to reach the level of superintelligence in MT (machine translation). (All this is highly speculative, but … Continue reading

Posted in blog | Tagged , , , , | Comments Off on What is required from Artificial General Intelligence with regard to Machine Translation?

Superintelligent machine translation (updated)

Let us consider superintelligence with regard to machine translation. To fix ideas, we can propose a rough definition: it consists of a machine with the ability to translate with 99% (or above) accuracy from one of the 8000 languages to … Continue reading

Posted in blog | Tagged , , , , , , , , , , , | Comments Off on Superintelligent machine translation (updated)

Is rule-based MT more ethical than statistical MT?

In the ongoing debate on safe IA, it is a relevant open question of whether rule-based MT is more ethical than statistical MT. Here are some arguments in favor of rule-based MT in this context (without blaming statistical MT which has its … Continue reading

Posted in blog | Tagged , , , , , | Comments Off on Is rule-based MT more ethical than statistical MT?

Rough typology of remaining errors (updated march 2018)

French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction … Continue reading

Posted in blog | Tagged , | Leave a comment

Evaluation of machine translation: why not self-evaluation?

Evaluation of machine translation is usually done via external tools (to cite some instances: ARPA, BLEU, METEOR, LEPOR, …). But let us investigate the idea of self-evaluation. For it seems that the software itself is capable of having an accurate … Continue reading

Posted in blog | Tagged , , , , | Leave a comment

Semantic disambiguation of French ‘femme’: in the mud, gold is still shining

  In Corsican language, French word ‘femme’ can be translated, depending on the context either into donna (woman) or into moglia (wife) The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

A Special Case of Anaphora Resolution

Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

Four consecutive ambiguous words

Translating the following sentence: ‘ce fait est unique’ is not as easy as it could seem at first glance. In effect, it is made up of four consecutive ambiguous words: ‘ce’: ‘ssu (demonstrative pronoun, this) or ciò (it, relative pronoun) … Continue reading

Posted in blog | Tagged , , , , | Leave a comment

What are the conditions for a given endangered language to be a candidate for rule-based machine translation?

What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for: a … Continue reading

Posted in blog | Tagged , , , , , , , , , | Comments Off on What are the conditions for a given endangered language to be a candidate for rule-based machine translation?

Solving fivefold ambiguity: translation for French ‘poste’

French word ‘poste’ has (at least) fivefold ambiguity. For it can designate: ‘poste’ (masculine singular noun) : postu, masculine singular noun (set, i.e. television set) ‘poste’ (masculine singular noun): posta, feminine singular noun (position): erroneously translated as postu in the present case … Continue reading

Posted in blog | Tagged , , , | Leave a comment

Another case of firstname ambiguity: ‘Noël’

Translation of the French word ‘Noël’ yields another case of ambiguity. For ‘Noël’ can translate: either into Natali (Christmas, Christmas Day): the annual festival commemorating Jesus Christ’s birth or into, identically, Natali (‘Noel‘): the firstname Now it seems there is no case of … Continue reading

Posted in blog | Tagged , , , , | Leave a comment

Interesting case of first name disambiguation

Here is an interesting case of first name disambiguation for machine translation. Consider the following first name ‘Camille’. It can apply to both genders. In Corsican (taravese or sartinese variants) it translates either into Cameddu (masculine) or Camedda (feminine). In … Continue reading

Posted in blog | Tagged , , , , , | Leave a comment

Writing differences between Corsican and Gallurese

Here are some writing differences between Corsican and Sardinian gallurese, that result from historical writing habits. These writing differences prevail, even when the words are the same: ghj is replaced by gghj: acciaghju (corsu), acciagghju (gallurese) , steel chj is … Continue reading

Posted in blog | Tagged , , , , , , , | Comments Off on Writing differences between Corsican and Gallurese

Quandu da la forza à la raghjoni cuntrasta Tandu vinci la forza è la raghjoni ùn basta

Quandu da la forza à la raghjoni cuntrasta Tandu vinci la forza è la raghjoni ùn basta. This is a rare Corsican proverb. In French, litterally: “Lorsque la force et la raison s’opposent, alors la force gagne car la raison … Continue reading

Posted in blog | Tagged , , , , , , , , , , , | Leave a comment

How rule-based and statistical machine translation can help each other

Here are a few suggestions on how rule-based and statistical machine translation  can help each other: (This is a follow-up to the previous post) to begin with, rule-based and statistical machine translation are often contrasted and compared: it would be … Continue reading

Posted in blog | Tagged , , , , , , | Leave a comment

Why rule-based translation is (presently) best suited to endangered languages

Here are some arguments in favor of the choice of rule-based translation concerning machine translation of endangered languages (it relates to the philosophy of language policy): there does not exist at present time a reliable corpus between the given endangered … Continue reading

Posted in blog | Tagged , , , , , , , | Leave a comment