Leaving ambiguity unresolved

Disambiguation is an essential process in machine translation. Sometimes, however, it seems more rational and logical to leave an ambiguity in the translation. This is the case when (i) there is an ambiguous word in the sentence to be translated; and (ii) the context does not provide an objective reason to choose one of the two occurrences. It seems that in this case, the best translation is the one that leaves the ambiguity intact.

Let’s take an example. Consider the following French sentence: ‘Son palais était en feu.’. The French word ‘palais’ is ambiguous, because it corresponds in English and in Corsican to two different words (palace, palazzu and palate, palatu).

Thus, we have 3 possibilities of translation:

  • His palate was on fire
  • His palace was on fire
  • His palace/palate was on fire

The third translation, in my opinion, is better, because it points out that the context is insufficient to choose one of the two alternatives.

Consider now, on the one hand, the following sentence: ‘Il avait mangé du piment fort. Son palais était en feu.’ Now the context provides an objective motivation to choose one of the two occurence. This yields the following translation: He had eaten some hot pepper. His palate was on fire.

On the other hand, consider the following sentence: ‘Les ennemis du prince avaient lancé des engins incendiaires. Son palais était en feu.’ We also have here an objective reason to choose the other alternative. It translates then: The prince’s enemies had thrown incendiary devices. His palace was on fire.

Posted in blog | Tagged , , , , , | Leave a comment

Dictionary = Corpus?

As far as machine translation is concerned, it seems that the best thing is to combine the best of the two approaches: rule-based or statistic-based. If it were possible to converge the two approaches, it seems that the benefit could be great. Let us try to define what could allow such a convergence, based on the two-sided grammatical approach. Let us try to illustrate this with a few examples.
To begin with, u soli sittimbrinu = ‘le soleil de septembre’ (the sun of September). In Corsican language, sittimbrinu is a masculine singular adjective that means ‘de septembre’ (of September). In French, ‘de septembre’ is–from an analytic perspective–a preposition followed by a common masculine singular noun. But according to the two-sided analysis ‘de septembre’ (of September) is also–from a synthetic perspective–a masculine singular adjective. This double nature, according to this two-sided analysis of ‘de septembre’, allows in fact the alignment of ‘de septembre’ (of September) with sittimbrinu.
More generally, if we define words or groups of words according to the two-sided grammatical analysis in the dictionary, we also have an alignment tool, which can be used for a translation system based on statistics, in the same way as a corpus. Thus, if it is sufficiently provided, the dictionary is also a corpus, and even more, an aligned corpus.

Posted in blog | Tagged , , , , , | Leave a comment

Grammatical taxonomy again: the case of prepositions

Let’s look at the translation of the word ‘whose’. Depending on the case, ‘whose’ can be a

  • relative pronoun: ‘la difficulté dont je t’ai parlé’ (the difficulty I told you about), ‘voilà le professeur dont j’apprécie beaucoup les cours’ (this is the teacher whose classes I really enjoy.)
  • or, more rarely, a preposition: ‘il y avait cinq couleurs, dont le rouge et le bleu’. (there were five colours, including red and blue.)

It is the latter case that we will be looking at. In this case, ‘dont’ is translated into English as ‘including’. In Corsican, the translation is: c’eranu cinque culori, frà i quali u rossu è u turchinu. But if we translate ‘il y avait cinq plantes, dont le ciste et la bruyère’ (‘there were five plants, including cistus and heather’), we get: c’eranu cinque piante, frà e quale u muchju è a scopa. Thus the translation of ‘dont’ (including) as a preposition is either frà i quali (masculine plural, culore being masculine in Corsican) or frà e quale (feminine plural), depending on which noun ‘dont’ refers to.

Thus ‘dont’ is translated into the masculine plural or the feminine plural, depending on the noun – either masculine or feminine – to which it refers. This casts doubt on the ‘prepositional’ nature of ‘dont’, and leads to further analysis to determine whether there might not be a more suitable grammatical type.

It is worth noting that ‘dont (including) can be replaced by ‘parmi lequels’ (among which, frà i quali) or ‘parmi lesquelles’ (among which, frà e quale) depending on whether the noun to which ‘whose’ refers is in the masculine plural or the feminine plural. This suggests that ‘whose’ could be conceived of as a preposition followed by a pronoun. In the spirit of this analysis, the BDL site notes: ‘Dont’ is probably the relative pronoun whose use is the most delicate. To use it correctly, one must know that dont always ‘hides’ the preposition ‘de’; ‘dont’ is equivalent to ‘de qui’, ‘de quoi’, ‘duquel’, etc. This link between ‘dont’ and ‘de’ goes back to the Latin origin of ‘dont’, which is from ‘unde’ “from where”.

More generally, this suggests that further analysis of some prepositions may be needed.

Posted in blog | Tagged , , | Leave a comment

Creating new grammatical types

Italian has ‘prepositions followed by articles’ (preposizione articolate). This is a specific grammatical type, which refers to a word (e.g. della) that replaces a preposition (di) followed by an article (la):

	il	lo	l’	la	i	gli	le
di	del	dello	dell’	della	dei	degli	delle
a	al	allo	all’	alla	ai	agli	alle
da	dal	dallo	dall’	dalla	dai	dagli	dalle
in	nel	nello	nell’	nella	nei	negli	nelle
su	sul	sullo	sull’	sulla	sui	sugli	sulle

This specific grammatical type also corresponds to:

  • in French: du = de le, des = de les
  • in Corsican and especially in the Sartenese variant: ‘llu = di lu, ‘lla = di la, etc.

This raises the general problem of the number of grammatical types we should retain. Should we create new grammatical types beyond the classical ones, in order to optimise translators and NLP in general? What is the best grammatical type to retain for ‘prepositions followed by an article’: a new primitive one or a compound one (always keeping Occam’s razor in mind)? A preposition followed by an article behaves like a preposition for words on its left, and like an article for words on its right.

Posted in blog | Tagged , , | Leave a comment

Evaluation of the performance after changes

Just performed a series of open tests, using the (pseudo-random) article of the day from wikipedia in French.The results are the following, concerning the Taravese version of the Corsican language:
95,76
95,76
94,34
95,76
99,25
95,04
95,48
that is to say an average of about 95%, taking into account that the ‘cismuntinca’ version generally obtains a slightly lower result, because of the masculine and feminine plurals which are different (whereas they are identical in Taravese).

Posted in blog | Leave a comment

Evaluating the performance of the translation after the changes made

The Corsican translator is changing. Let’s go back to the tests with the French wikipedia article of the day, to have a better idea of the progress made (if any).
There are two errors here (partitive article). The evaluation is: 1 – (2/102) = 98.03%.

Posted in blog | Leave a comment

Grammatical word-disambiguation again and again

The main difficulty here seems to lie in the adaptation of the grammatical disambiguation module. Indeed, for the French language, such a module performs disambiguation with respect to about 100 categories. The number of pairs (or 3-tuples, 4-tuples, etc.) of disambiguation, for French, is about 250. The question is: when we change languages, how many categories of n-tuples of disambiguation does this result in? In particular, when one switches from French to Italian, does this result in a big change in the categories to be disambiguated?

Let’s take an example, with a particular category of words to disambiguate. One such category is for example AQfs/Vsing3present (feminine singular adjective or verb in the 3rd person singular present tense). A word in Italian that belongs to this type is ‘stanca’. So we have both uses:

  • ‘è stanca’ (she is tired): AQfs
  • stanca il cavallo’ (it tires the horse): Vsing3present
    In French, we don’t have this kind of disambiguation category directly because the category concerned is broader than that: it includes at least the 1st person singular of the present. Thus we have the word ‘sèche’, which belongs to this type of disambiguation category:
  • ‘la feuille est sèche’ (the leaf is dry): AQfs
  • ‘je sèche mes cheveux’ (I dry my hair): Vsing1present
  • ‘il sèche sa chemise’ (he dries his shirt): Vsing3present

Of course, the code that allows the disambiguation of AQfs/Vsing1present/Vsing3present should also allow the derivation of the disambiguation of AQfs/Vsing3present. But this gives an idea of the kind of problems that arise and the adaptation needed.

If the types of disambiguation are very different from one language to another, it will be necessary to have a disambiguation module which is capable of adapting to many new types of disambiguation and which is therefore very flexible. This appears to be a considerable difficulty for the creation of an eco-system. It seems that Apertium, faced with this difficulty, has chosen a statistical module as a solution for its eco-system. However, the question of whether such a flexible module, adaptable without difficulty from one language to another, is feasible in the context of rule-based MT, remains an open question.

Posted in blog | Tagged , , , | Leave a comment

First feasability test: dictionary morphing

The first test carried out to transform the dictionary (in the extended sense) based on the French-Corsican pair, into a dictionary related to the Italian-Gallurian pair, shows that it is feasible. The result – of an acceptable but perfectible quality – is obtained in 21 minutes (with 16 GO RAM & Intel core i7-8550U CPU). We start with a multi-lingual dictionary based on French entries, and the final result is an Italian-Gallurese dictionary.

Posted in blog | Tagged , , , | Leave a comment

Translation from Italian to Gallurese

Our new project will be to try to implement the translation from Italian into Gallurese. For this is an essential pair for the Gallurese language, which is a priority. The major difficulty in doing this is:
– on the one hand, to (automatically) transform the dictionary (in the extended sense) based on the French-Corsican pair, into a dictionary related to the Italian-Gallurese pair
– on the other hand, to implement automatically (without having to rewrite them entirely) the other modules, and in particular the one based on grammatical disambiguation.

The stakes here seem high. It is a question of transforming a system that can translate one pair of languages (i.e. French into Corsican) into an eco-system that can translate several pairs of languages (the target language of which being an endangered language).

Posted in blog | Tagged , , , | Leave a comment

Adjective modifiers again

We will consider again a category of words such as ‘very’, when they precede an adjective. Traditionally, this category is termed ‘adverbs’ or ‘adverbs of degree’, but we prefer ‘adjective modifier’, because (i) analytically, they change the meaning of an adjective and (ii) synthetically, an adjective modifier followed by an adjective is still an adjective. A more complete list is: almost, absolutely, badly, barely, completely, decidedly, deeply, enormously, entirely, extremely, fairly, fully, greatly, hardly, highly, how, incredibly, intensely, less, most, much, nearly, perfectly, positively, practically, pretty, purely, quite, rather, really, scarcely, simply, somewhat, strongly, terribly, thoroughly, totally, utterly, very, virtually, well.

If we look at sentences such as: il est bien content (he is very happy, hè beddu cuntenti), ils étaient bien contents (they were very happy, erani beddi cuntenti), elle serait bien contente (she would be very happy, saria bedda cuntenti), elles sont bien contentes (they are very happy, sò beddi cuntenti), we can see that the modifier of the adjective ‘bien’ is rendered as very in English and in Corsican as:

  • bellu/beddu: singular masculine
  • belli/beddi: plural masculine
  • bella/bedda: feminine singular
  • belle/beddi: feminine plural

This shows that the adjective modifier is invariable in French and English, but varies in gender and number in Corsican. Thus, in Corsican grammar, it seems appropriate to distinguish between:

  • singular masculine adjective modifier
  • plural masculine adjective modifier
  • singular feminine adjective modifier
  • plural feminine adjective modifier

On the other hand, such a distinction does not seem useful in English and French, where the category of ‘adjective modifier’ is sufficient and there is no need for further detail.

Posted in blog | Tagged , , , , , , | Leave a comment

On ‘reflexive pronouns’

Pursuing the reflection on grammatical categories, we will examine now “reflexive pronouns”. These are:

  • me te se nous vous se (French)
  • mi ti si ci vi si (Corsican)
  • myself yourself himself/herself/itself ourselves yourselves themselves

Let us take an example:

  • je me promène, tu te promènes, il se promène, nous nous promenons, vous vous promenez, ils se promènent
  • I walk, you walk, he walks, we walk, we walk, you walk, they walk
  • spassieghju, spassieghji, spassieghja, spassiemu, spassieti, spassièghjani

These reflexive pronouns are usually associated with so-called pronominal verbs.
From our point of view, this classification as ‘pronouns’ is unsatisfactory, because they always precede a verb,1 but are placed after a personal subject pronoun, an indefinite pronoun, or a nominal group. In particular, the notion of pronoun following a pronoun is not coherent, from the point of view of our analysis, where the main criterion for typology is the position of a given grammatical type in relation to another.

Let us recall here that the idea behind this reconstruction of grammatical typology is the hypothesis that traditional classification lacks coherence and that this considerably hinders the development of natural language analysis and, at the same time, the development of machine translation modules based on the emulation of human reasoning.

This example suggests that the classic ‘reflexive pronoun’ is a word that introduces into the verb to which it refers a notion of reflexivity of action. In this sense, it is more of a specialized verb modifier. It is thus more akin to the adverb in the sense that we have defined it, i.e. a verb modifier in the broad sense. The adverb in this sense can be placed before or after the verb. On the other hand, the reflexive verb modifier as we have defined it can only be placed in French before the verb.

1 I oversimplify here, since there are also some structures like: tu t’en souviens (you remember it ti n’inveni).

Posted in blog | Leave a comment

Grammatical word-disambiguation again

The challenge is especially that of generalizing the grammatical word-disambiguation to several languages. Creating a module of grammatical word-disambiguation for each language appears to be a long and arduous task. This seems to be the main difficulty. But if a module specific to a given language can be generalized to several other languages, this could be an important advance in the field of rule-based machine translation (which simulates human reasoning seems to me a more appropriate term).

We can describe the problem more precisely. We have about 100 grammatical categories for a given language. We also have about 300 ambiguous grammatical types – to fix ideas – which are: e.g., adverb or preposition, singular masculine noun or singular masculine adjective, etc. The problem is to describe an algorithm to remove the ambiguity and determine the corresponding grammatical type according to the context.

Now rewriting the complete module of disambiguation by grammatical type, so that it can be used and adapted to other languages (Italian in the first place). It remains to be seen if this can be done.

Posted in blog | Tagged , , , , | Leave a comment

First steps in gallurese language

The translator takes his first steps in translating from French into the Gallurian language. The first tests show a score of 75-80%, with many errors in grammar, spelling and vocabulary. It will be necessary to reach a score of 90% before the result can be published.

The ideal would have been the Italian-Gallurian translation, but this is not yet possible: it will be necessary to translate (i) Italian into French, then (ii) French into Gallurian.

Posted in blog | Tagged , , , , | Leave a comment

Hinting at the Control problem

The question of choosing the best system to solve the problems posed by word disambiguation in the field of translation seems to be linked to the AGI control problem (how to avoid that an AGI finally turns out to be harmful for its creators). It seems that when we have the choice between several methods to develop an AI, it is wiser to choose the one that allows a better control of the AGI. As far as machine translation is concerned, we should thus prefer in this regard the method that emulates human reasoning, and that produces a response that can be broken down step by step into the reasoning that leads to it. This makes it possible to accurately determine the cause of an error, but also to remedy it. This problem does not only concern machine translation, but has a somewhat extended scope. For grammatical disambiguation concerns machine translation, but also the understanding of natural language, and disambiguation according to context, in the very absence of any translation.

Posted in blog | Tagged , , , | Leave a comment

On the implementation of grammatical disambiguation

Grammatical disambiguation – i.e. whether ‘maintenant’ is and adverb (now) or the gerundive (maintaining) of the verb ‘maintenir’ – seems to be the crucial issue for the adoption of the rule-based model or statistical model for machine translation. This problem is widespread and seems to concern all languages. For the French language, this problem of grammatical disambiguation concerns about 1 word out of 7. Effective grammatical disambiguation is difficult to implement. The advantage of adopting the statistical method for grammatical disambiguation is that the same method can be generalized and used for several languages. In the case of the rule-based model, the module of grammatical disambiguation must be rewritten for each language, which generates considerable complexity and requires a very significant development time. Therefore, a rule-based method for grammatical disambiguation that can be easily applied to several languages would be of great interest. This seems to be the main difficulty that rule-based machine translation is designed to overcome.

But if we want an artificial intelligence that not only provides an (mostly accurate) answer without being able to really explain its reasoning, but is truly able to emulate human reasoning and to justify and describe step by step the reasoning that leads to its answer, then it is worth the effort.

Posted in blog | Tagged , , , , | Leave a comment

The 90% rule

The translation from French to Gallurese is in progress and currently under development. An application for Android is first planned. It will be called ‘traducidori gaddhuresu’. Currently the French-Gallurese translator is undergoing testing. It will only be published if its performance (evaluated by an open test) is above 90%. This is a rule that we apply to ourselves, and is specific to endangered languages. We consider that for them, a poor or low quality translation can be more harmful than useful.

Posted in blog | Tagged , , , , , , | Leave a comment

A “traducidori gaddhuresu” in preparation

After the Corsican language, the second endangered language for which we would like to develop a translator is the Gallurese language (“traducidori gaddhuresu”). As far as the ‘traducidori gaddhuresu’ is concerned, we are considering an Android application and a Windows version.

The priority pair for Gallurese is Italian-Gallurese. However, it will not be possible to make an Italian-Gallurese translator at first. It is a French-Gallurese translator that is first of all in preparation. It will therefore be necessary, at first, to translate a text from Italian into French first (especially with Deepl, which is of very good quality), and then to use the French-Gallurese translator.

Posted in blog | Leave a comment

Gallurese language

Our next project will be to implement the translation from Italian into Gallurese (gaddhuresu), or from French into Gallurese. The Gallurese language is close to the Corsican language, in particular to the ‘Rucchisgiana’ (Alta Rocca) or ‘Sartinese’ variant of the Corsican language. However, there are significant differences in writing and morphology between Gallurese and Corsican. A difficulty will be, as for the Corsican language, the management of the variants. The ideal would be to manage the main variants. In a first step, we will try to implement one of the main variants of the Gallurese language (we will preferably choose a well documented variant, such as the one used in the writings of Maria Teresa Inzaina).

Posted in blog | Leave a comment

Updating our grammatical typology

We now have the following categories in our grammatical taxonomy:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs (but in a restricted sense with regard to classical grammar)
  • adverb (still in a restricted sense) modifiers

To be noted: the classical category of adverbs comprises here the following categories:

  • adjective modifiers
  • verb modifiers
  • adverb modifiers

Posted in blog | Tagged , , , , | Leave a comment

On the category of adverb modifiers

Let’s continue to rethink the gruesome (so is it argued here) category of adverbs (in the classical sense). Let’s now turn our attention to the category of ‘adverb modifiers’. Adverbs are understood here in a restricted sense: they are either verb modifiers or proposition modifiers. In this context, we are likely to encounter adverb modifiers. In general, the adverb modifier precedes the adverb. Thus, very (‘très’) is an adverb modifier in the sequence he was eating very rarely (il mangeait très rarement’, manghjava mori raramenti).

Likewise more (‘plus’, più) is in some cases an adverb modifier. This is the case in the sequence he was drinking more frequently (‘il buvait plus fréquemment’, biia più suventi).

Posted in blog | Tagged , , , , , , | Leave a comment

The case of adjective modifiers and the notion of grammatical proof

Let’s consider again the case of adjective modifiers (in classical grammar, this category of words are considered as degree adverbs). These include the following: peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,… We have argued that this category of words are ‘adjective modifiers’, when they precede an adjective. But is such an assertion likely to be proven, or is there some form of evidence available? Grammar, like other disciplines, requires that assertions be justified, and if possible proven. The notion of proof in grammar, however, is uncommon. Let’s see if we can provide such proof or justification?

Consider the case of ‘tellement’ (so much), which we consider to be an adjective modifier when it precedes an adjective. Now, let us consider the following translations, where ‘tellement’ is used:

  • in French: il est tellement beau, ils sont tellement petits, elles est tellement belle, elles sont tellement intelligentes
  • in English: it is so beautiful, they are so small, they are so beautiful, they are so smart
  • in Corsican: hè tantu bellu, sò tanti chjuchi, hè tanta bella, sò tante intelligente (an alternative translation hè: hè cusì bellu, sò cusì chjuchi, hè cusì bella, sò cusì intelligente)
  • in Italian: è così bello, sono così piccoli, sono così belli, sono così intelligenti

It is patent here that ‘tellement’ preceding an adjective is translated in Corsican by:

  • tantu, when the adjective is singular masculine
  • tanti, when the adjective is plural masculine
  • tanta, when the adjective is singular feminine
  • tante, when the adjective is plural feminine

Thus ‘tellement’ (so much, tantu/tanti/tanta/tante), employed in this usage, i.e. preceding an adjective, accords with the adjective to which it refers. This sounds as a justification of its classification as an adjective modifier.

Posted in blog | Tagged , , , , , | Leave a comment

The status of adverbs

What are adverbs in the present grammatical taxonomy? Adverbs have a much more restrictive definition here than in their traditional definition. Adverbs in this typology are verb modifiers. Therefore, adverbs are distinct from:

  • adjective modifiers (such as peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,…
  • proposition modifiers, which change the meaning of a proposition
Posted in blog | Leave a comment

The status of adjective modifiers

What is the status of adjective modifiers (tant, tout juste, un rien, un tantinet, très, extrêmement, … = so much, just a little, a little, a little, very, extremely, …) in the present grammatical typology? Adjectives are defined as noun modifiers. So adjective modifiers would be modifiers of noun modifiers? This sounds intriguing. In reality, we do not have the concept of ‘modifiers of modifiers’. In fact, we have the following rules:

  • a verb modifier followed by a verb is a verb
  • a determinant modifier followed by a determinant is a determinant
  • and generally speaking, a modifier of an X followed by an X is an X (where X is a given grammatical type)
    So a noun modifier followed by a noun is a noun, i.e. an adjective followed by a noun is a noun. For example: ‘un très beau livre’ (a very nice book), where ‘very’ is an adjective modifier, ‘nice’ is an adjective, i.e. a noun modifier, and ‘book’ is a noun.
    Hence finally, ‘an adjective modifier is a modifier of a noun modifier’ reads as follows: an adjective modifier is a modifier of [noun modifier].
Posted in blog | Tagged , , , , , | Leave a comment

Grammatical typology again

What are the characteristics of the resulting grammatical typology? We now have the following categories:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs but in a restricted sense

Posted in blog | Leave a comment

The status of adjectives

What is the status of adjectives in the present grammatical typology? The notion of modifier is central to this taxonomy. Thus, the adjective is a noun modifier. In the expression ‘the blue sky’, ‘blue’ is a modifier of the noun ‘sky’. The definition of the adjective as a noun modifier is quite in line with the definition given for example by Merriam-Webster: ‘a word belonging to one of the major form classes in any of numerous languages and typically serving as a modifier of a noun to denote a quality of the thing named, to indicate its quantity or extent, or to specify a thing as distinct from something else’.

Posted in blog | Leave a comment

The case of new words for machine translation

Another case that argues for the use of rule-based translation, i.e. human-like, is the following. Frequently we come across a new word, a word we have never seen before. More often than not, a human knows how to translate it. Because there are rules that allow to translate a word from a given language into another language, even if we do not know the meaning of this last word. For example, ‘anthranilic acid’ can be translated precisely as ‘anthranilic acid’ by a human, even if he has no knowledge of the acid in question. For this type of ability to translate new words encountered, the statistical method is not adequate and the machine translator must have the ability to determine (i) the grammatical nature of the word in question; (ii) translate the new word encountered based on the morphological rules for translating words of this grammatical type from one language to another. An AGI, capable of translating, should possess this type of ability.

Posted in blog | Leave a comment

Characteristics of an AGI (artificial general intelligence)

What are the characteristics we want for an AGI (artificial general intelligence)? An AGI should have a very advanced capacity in NLP and language comprehension. One of the qualities we expect from an AGI is respect for multilingualism. Hopefully, the AGI should have extensive NLP capabilities, which apply to a large number of languages, and even to the 8000 languages of the planet, i.e. also to the 90% of endangered languages. The AGI could thus help to solve an important problem inherent to the problem of language extinction, which affects human cultural diversity (it can be assumed that some languages will be extinct at the time of the AGI event, but the AGI could thus help to revitalize them).

Posted in blog | Tagged , , , | Leave a comment

The two-language matching problem

Here is a problem for a human intelligence (or an AGI): we have a dictionary (with words, lemmas and grammatical types) in a language A and a second dictionary in a language B. If we have an extensive corpus of each of the two languages, is it possible to create a translation dictionary from A to B, and how? To take an example: if the two languages were French and English, we would have to associate ‘cheval’ with ‘horse’, etc. in the final translation dictionary, and so on for all the words of language A.

Highly related seems to be this paper: Deciphering Undersegmented Ancient Scripts Using Phonetic Prior.

Posted in blog | Tagged , , , | Leave a comment

Prototype of text search with optional grammatical type

Inconditional search

Let us expand the idea of text analysis derived from rule-based translation. Above is an example of a classic word-based search. In this particular case, it is the French word ‘été’. This word is ambiguous because it can be a common noun (‘summer’), or a past participle (‘been’). Below is an example of a search for the word ‘summer’ associated with the grammatical type ‘common noun’.

Conditional search based on ‘noun’ grammatical type

Finally, we have below an example of a search for the word ‘summer’ associated with the grammatical type ‘past participle’.

Conditional search based on ‘past participle’ grammatical type
Posted in blog | Tagged , , , , , , , , , , | Leave a comment

Why it’s worth it to engage in rule-based translation

Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) building a fair typology of grammatical types. But once these steps begin to be mastered, there are many advantages. What seems essential here is that with the same piece of software, both machine translation and text analysis can be carried out. Among the modules that are easy to implement are the following:

  • lemmatizer
  • part-of-speech tagger
  • singularizer
  • pluralizer
  • grammar checker
  • type extractor: a module that allows you to extract words from a text according to their grammatical category

For the implementation of rule-based translation provides the machine with some inherent understanding of the text, in the same way that a human being does. To put it in a nutshell, it is better artificial intelligence.

Finally, other modules, more advanced, seem possible (to be confirmed).

Posted in blog | Tagged , , , , , , , , , , , , , , | Leave a comment