Adjective modifiers again

We will consider again a category of words such as ‘very’, when they precede an adjective. Traditionally, this category is termed ‘adverbs’ or ‘adverbs of degree’, but we prefer ‘adjective modifier’, because (i) analytically, they change the meaning of an adjective and (ii) synthetically, an adjective modifier followed by an adjective is still an adjective. A more complete list is: almost, absolutely, badly, barely, completely, decidedly, deeply, enormously, entirely, extremely, fairly, fully, greatly, hardly, highly, how, incredibly, intensely, less, most, much, nearly, perfectly, positively, practically, pretty, purely, quite, rather, really, scarcely, simply, somewhat, strongly, terribly, thoroughly, totally, utterly, very, virtually, well.

If we look at sentences such as: il est bien content (he is very happy, hè beddu cuntenti), ils étaient bien contents (they were very happy, erani beddi cuntenti), elle serait bien contente (she would be very happy, saria bedda cuntenti), elles sont bien contentes (they are very happy, sò beddi cuntenti), we can see that the modifier of the adjective ‘bien’ is rendered as very in English and in Corsican as:

  • bellu/beddu: singular masculine
  • belli/beddi: plural masculine
  • bella/bedda: feminine singular
  • belle/beddi: feminine plural

This shows that the adjective modifier is invariable in French and English, but varies in gender and number in Corsican. Thus, in Corsican grammar, it seems appropriate to distinguish between:

  • singular masculine adjective modifier
  • plural masculine adjective modifier
  • singular feminine adjective modifier
  • plural feminine adjective modifier

On the other hand, such a distinction does not seem useful in English and French, where the category of ‘adjective modifier’ is sufficient and there is no need for further detail.

On ‘reflexive pronouns’

Pursuing the reflection on grammatical categories, we will examine now “reflexive pronouns”. These are:

  • me te se nous vous se (French)
  • mi ti si ci vi si (Corsican)
  • myself yourself himself/herself/itself ourselves yourselves themselves

Let us take an example:

  • je me promène, tu te promènes, il se promène, nous nous promenons, vous vous promenez, ils se promènent
  • I walk, you walk, he walks, we walk, we walk, you walk, they walk
  • spassieghju, spassieghji, spassieghja, spassiemu, spassieti, spassièghjani

These reflexive pronouns are usually associated with so-called pronominal verbs.
From our point of view, this classification as ‘pronouns’ is unsatisfactory, because they always precede a verb,1 but are placed after a personal subject pronoun, an indefinite pronoun, or a nominal group. In particular, the notion of pronoun following a pronoun is not coherent, from the point of view of our analysis, where the main criterion for typology is the position of a given grammatical type in relation to another.

Let us recall here that the idea behind this reconstruction of grammatical typology is the hypothesis that traditional classification lacks coherence and that this considerably hinders the development of natural language analysis and, at the same time, the development of machine translation modules based on the emulation of human reasoning.

This example suggests that the classic ‘reflexive pronoun’ is a word that introduces into the verb to which it refers a notion of reflexivity of action. In this sense, it is more of a specialized verb modifier. It is thus more akin to the adverb in the sense that we have defined it, i.e. a verb modifier in the broad sense. The adverb in this sense can be placed before or after the verb. On the other hand, the reflexive verb modifier as we have defined it can only be placed in French before the verb.

1 I oversimplify here, since there are also some structures like: tu t’en souviens (you remember it ti n’inveni).

Grammatical word-disambiguation again

The challenge is especially that of generalizing the grammatical word-disambiguation to several languages. Creating a module of grammatical word-disambiguation for each language appears to be a long and arduous task. This seems to be the main difficulty. But if a module specific to a given language can be generalized to several other languages, this could be an important advance in the field of rule-based machine translation (which simulates human reasoning seems to me a more appropriate term).

We can describe the problem more precisely. We have about 100 grammatical categories for a given language. We also have about 300 ambiguous grammatical types – to fix ideas – which are: e.g., adverb or preposition, singular masculine noun or singular masculine adjective, etc. The problem is to describe an algorithm to remove the ambiguity and determine the corresponding grammatical type according to the context.

Now rewriting the complete module of disambiguation by grammatical type, so that it can be used and adapted to other languages (Italian in the first place). It remains to be seen if this can be done.

First steps in gallurese language

The translator takes his first steps in translating from French into the Gallurian language. The first tests show a score of 75-80%, with many errors in grammar, spelling and vocabulary. It will be necessary to reach a score of 90% before the result can be published.

The ideal would have been the Italian-Gallurian translation, but this is not yet possible: it will be necessary to translate (i) Italian into French, then (ii) French into Gallurian.

Hinting at the Control problem

The question of choosing the best system to solve the problems posed by word disambiguation in the field of translation seems to be linked to the AGI control problem (how to avoid that an AGI finally turns out to be harmful for its creators). It seems that when we have the choice between several methods to develop an AI, it is wiser to choose the one that allows a better control of the AGI. As far as machine translation is concerned, we should thus prefer in this regard the method that emulates human reasoning, and that produces a response that can be broken down step by step into the reasoning that leads to it. This makes it possible to accurately determine the cause of an error, but also to remedy it. This problem does not only concern machine translation, but has a somewhat extended scope. For grammatical disambiguation concerns machine translation, but also the understanding of natural language, and disambiguation according to context, in the very absence of any translation.

On the implementation of grammatical disambiguation

Grammatical disambiguation – i.e. whether ‘maintenant’ is and adverb (now) or the gerundive (maintaining) of the verb ‘maintenir’ – seems to be the crucial issue for the adoption of the rule-based model or statistical model for machine translation. This problem is widespread and seems to concern all languages. For the French language, this problem of grammatical disambiguation concerns about 1 word out of 7. Effective grammatical disambiguation is difficult to implement. The advantage of adopting the statistical method for grammatical disambiguation is that the same method can be generalized and used for several languages. In the case of the rule-based model, the module of grammatical disambiguation must be rewritten for each language, which generates considerable complexity and requires a very significant development time. Therefore, a rule-based method for grammatical disambiguation that can be easily applied to several languages would be of great interest. This seems to be the main difficulty that rule-based machine translation is designed to overcome.

But if we want an artificial intelligence that not only provides an (mostly accurate) answer without being able to really explain its reasoning, but is truly able to emulate human reasoning and to justify and describe step by step the reasoning that leads to its answer, then it is worth the effort.

The 90% rule

The translation from French to Gallurese is in progress and currently under development. An application for Android is first planned. It will be called ‘traducidori gaddhuresu’. Currently the French-Gallurese translator is undergoing testing. It will only be published if its performance (evaluated by an open test) is above 90%. This is a rule that we apply to ourselves, and is specific to endangered languages. We consider that for them, a poor or low quality translation can be more harmful than useful.

A “traducidori gaddhuresu” in preparation

After the Corsican language, the second endangered language for which we would like to develop a translator is the Gallurese language (“traducidori gaddhuresu”). As far as the ‘traducidori gaddhuresu’ is concerned, we are considering an Android application and a Windows version.

The priority pair for Gallurese is Italian-Gallurese. However, it will not be possible to make an Italian-Gallurese translator at first. It is a French-Gallurese translator that is first of all in preparation. It will therefore be necessary, at first, to translate a text from Italian into French first (especially with Deepl, which is of very good quality), and then to use the French-Gallurese translator.

Gallurese language

Our next project will be to implement the translation from Italian into Gallurese (gaddhuresu), or from French into Gallurese. The Gallurese language is close to the Corsican language, in particular to the ‘Rucchisgiana’ (Alta Rocca) or ‘Sartinese’ variant of the Corsican language. However, there are significant differences in writing and morphology between Gallurese and Corsican. A difficulty will be, as for the Corsican language, the management of the variants. The ideal would be to manage the main variants. In a first step, we will try to implement one of the main variants of the Gallurese language (we will preferably choose a well documented variant, such as the one used in the writings of Maria Teresa Inzaina).

Updating our grammatical typology

We now have the following categories in our grammatical taxonomy:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs (but in a restricted sense with regard to classical grammar)
  • adverb (still in a restricted sense) modifiers

To be noted: the classical category of adverbs comprises here the following categories:

  • adjective modifiers
  • verb modifiers
  • adverb modifiers

On the category of adverb modifiers

Let’s continue to rethink the gruesome (so is it argued here) category of adverbs (in the classical sense). Let’s now turn our attention to the category of ‘adverb modifiers’. Adverbs are understood here in a restricted sense: they are either verb modifiers or proposition modifiers. In this context, we are likely to encounter adverb modifiers. In general, the adverb modifier precedes the adverb. Thus, very (‘très’) is an adverb modifier in the sequence he was eating very rarely (il mangeait très rarement’, manghjava mori raramenti).

Likewise more (‘plus’, più) is in some cases an adverb modifier. This is the case in the sequence he was drinking more frequently (‘il buvait plus fréquemment’, biia più suventi).

The case of adjective modifiers and the notion of grammatical proof

Let’s consider again the case of adjective modifiers (in classical grammar, this category of words are considered as degree adverbs). These include the following: peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,… We have argued that this category of words are ‘adjective modifiers’, when they precede an adjective. But is such an assertion likely to be proven, or is there some form of evidence available? Grammar, like other disciplines, requires that assertions be justified, and if possible proven. The notion of proof in grammar, however, is uncommon. Let’s see if we can provide such proof or justification?

Consider the case of ‘tellement’ (so much), which we consider to be an adjective modifier when it precedes an adjective. Now, let us consider the following translations, where ‘tellement’ is used:

  • in French: il est tellement beau, ils sont tellement petits, elles est tellement belle, elles sont tellement intelligentes
  • in English: it is so beautiful, they are so small, they are so beautiful, they are so smart
  • in Corsican: hè tantu bellu, sò tanti chjuchi, hè tanta bella, sò tante intelligente (an alternative translation hè: hè cusì bellu, sò cusì chjuchi, hè cusì bella, sò cusì intelligente)
  • in Italian: è così bello, sono così piccoli, sono così belli, sono così intelligenti

It is patent here that ‘tellement’ preceding an adjective is translated in Corsican by:

  • tantu, when the adjective is singular masculine
  • tanti, when the adjective is plural masculine
  • tanta, when the adjective is singular feminine
  • tante, when the adjective is plural feminine

Thus ‘tellement’ (so much, tantu/tanti/tanta/tante), employed in this usage, i.e. preceding an adjective, accords with the adjective to which it refers. This sounds as a justification of its classification as an adjective modifier.

The status of adverbs

What are adverbs in the present grammatical taxonomy? Adverbs have a much more restrictive definition here than in their traditional definition. Adverbs in this typology are verb modifiers. Therefore, adverbs are distinct from:

  • adjective modifiers (such as peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,…
  • proposition modifiers, which change the meaning of a proposition
The status of adjective modifiers

What is the status of adjective modifiers (tant, tout juste, un rien, un tantinet, très, extrêmement, … = so much, just a little, a little, a little, very, extremely, …) in the present grammatical typology? Adjectives are defined as noun modifiers. So adjective modifiers would be modifiers of noun modifiers? This sounds intriguing. In reality, we do not have the concept of ‘modifiers of modifiers’. In fact, we have the following rules:

  • a verb modifier followed by a verb is a verb
  • a determinant modifier followed by a determinant is a determinant
  • and generally speaking, a modifier of an X followed by an X is an X (where X is a given grammatical type)
    So a noun modifier followed by a noun is a noun, i.e. an adjective followed by a noun is a noun. For example: ‘un très beau livre’ (a very nice book), where ‘very’ is an adjective modifier, ‘nice’ is an adjective, i.e. a noun modifier, and ‘book’ is a noun.
    Hence finally, ‘an adjective modifier is a modifier of a noun modifier’ reads as follows: an adjective modifier is a modifier of [noun modifier].
Grammatical typology again

What are the characteristics of the resulting grammatical typology? We now have the following categories:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs but in a restricted sense

The status of adjectives

What is the status of adjectives in the present grammatical typology? The notion of modifier is central to this taxonomy. Thus, the adjective is a noun modifier. In the expression ‘the blue sky’, ‘blue’ is a modifier of the noun ‘sky’. The definition of the adjective as a noun modifier is quite in line with the definition given for example by Merriam-Webster: ‘a word belonging to one of the major form classes in any of numerous languages and typically serving as a modifier of a noun to denote a quality of the thing named, to indicate its quantity or extent, or to specify a thing as distinct from something else’.

The case of new words for machine translation

Another case that argues for the use of rule-based translation, i.e. human-like, is the following. Frequently we come across a new word, a word we have never seen before. More often than not, a human knows how to translate it. Because there are rules that allow to translate a word from a given language into another language, even if we do not know the meaning of this last word. For example, ‘anthranilic acid’ can be translated precisely as ‘anthranilic acid’ by a human, even if he has no knowledge of the acid in question. For this type of ability to translate new words encountered, the statistical method is not adequate and the machine translator must have the ability to determine (i) the grammatical nature of the word in question; (ii) translate the new word encountered based on the morphological rules for translating words of this grammatical type from one language to another. An AGI, capable of translating, should possess this type of ability.

Characteristics of an AGI (artificial general intelligence)

What are the characteristics we want for an AGI (artificial general intelligence)? An AGI should have a very advanced capacity in NLP and language comprehension. One of the qualities we expect from an AGI is respect for multilingualism. Hopefully, the AGI should have extensive NLP capabilities, which apply to a large number of languages, and even to the 8000 languages of the planet, i.e. also to the 90% of endangered languages. The AGI could thus help to solve an important problem inherent to the problem of language extinction, which affects human cultural diversity (it can be assumed that some languages will be extinct at the time of the AGI event, but the AGI could thus help to revitalize them).

The two-language matching problem

Here is a problem for a human intelligence (or an AGI): we have a dictionary (with words, lemmas and grammatical types) in a language A and a second dictionary in a language B. If we have an extensive corpus of each of the two languages, is it possible to create a translation dictionary from A to B, and how? To take an example: if the two languages were French and English, we would have to associate ‘cheval’ with ‘horse’, etc. in the final translation dictionary, and so on for all the words of language A.

Highly related seems to be this paper: Deciphering Undersegmented Ancient Scripts Using Phonetic Prior.

Prototype of text search with optional grammatical type

Inconditional search

Let us expand the idea of text analysis derived from rule-based translation. Above is an example of a classic word-based search. In this particular case, it is the French word ‘été’. This word is ambiguous because it can be a common noun (‘summer’), or a past participle (‘been’). Below is an example of a search for the word ‘summer’ associated with the grammatical type ‘common noun’.

Conditional search based on ‘noun’ grammatical type

Finally, we have below an example of a search for the word ‘summer’ associated with the grammatical type ‘past participle’.

Conditional search based on ‘past participle’ grammatical type
Why it’s worth it to engage in rule-based translation

Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) building a fair typology of grammatical types. But once these steps begin to be mastered, there are many advantages. What seems essential here is that with the same piece of software, both machine translation and text analysis can be carried out. Among the modules that are easy to implement are the following:

  • lemmatizer
  • part-of-speech tagger
  • singularizer
  • pluralizer
  • grammar checker
  • type extractor: a module that allows you to extract words from a text according to their grammatical category

For the implementation of rule-based translation provides the machine with some inherent understanding of the text, in the same way that a human being does. To put it in a nutshell, it is better artificial intelligence.

Finally, other modules, more advanced, seem possible (to be confirmed).

A two-sided analysis of postpositions

#preposition #postposition Consider the following adverbs: après (after, dopu) (he would eat after), avant (before, nanzi) (they had seen them before). They can also be considered as prepositions:

  • après la fête: after the feast, dopu à a festa
  • avant le mois de juin: before the month of June, nanzi u mesi di ghjunghju
    Likewise, during is also a preposition: durant la procession, during the procession, mentri a prucissioni
    But après, avant, durant can also be used differently:
  • deux jours après: two days after, dui ghjorni dopu
  • une semaine avant: one week before, una sittimana innanzi
  • deux mois durant: for two months, mentri dui mesi
    From our point of view, these are postpositions, because they are then followed by punctuation (in general), and preceded by a common name.
    If we now extend this analysis to locutions, the following locutions are also postpositions:
  • plus tard: later, dopu; deux jours plus tard: two days later, dui ghjorni dopu
  • plus loin: further, più luntanu; trois mètres plus loin: three meters further
  • plus près: closer, più vicinu; dix centimètres plus près: ten centimeters closer

More on two-sided grammar

Let’s focus on analyzing the following phrases:

  • à force de courage (bravely)
  • à force de courage et de persévérance (by dint of courage and perseverance)
  • avec beaucoup d’abnégation (selflessly)
  • d’une manière ou d’une autre (in any way)
  • d’une façon vraiment admirable (in a very admirable way)
  • au moment le plus opportun (when most appropriate)

What is their grammatical nature? From the point of view of two-sided grammar, what are they?

From a synthetic standpoint, first of all, they are adverbs. Let us turn now to their nature from an analytical point of view.

  • à force de courage (bravely): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun: PS-NC-PS-NC.
  • à force de courage et de persévérance (by dint of courage and perseverance): analytically, it is a preposition, followed by a common noun, then another preposition, then another common noun, then a conjunction, then another preposition and then another common noun: PS-NC-PS-NC-CONJ-PS-NC.
  • and so on
Lemmatizer for French language updated

I just updated the lemmatizer for French language. Many new options are available.

The API can be tested here:

Reflections on grammatical typologies

It is useful to point out the differences that may exist between different grammatical typologies. The classical grammatical taxonomy is essentially aimed at teaching and comprehension. It therefore has a pedagogical purpose. On the other hand, the taxonomy that is useful for rule-based machine translation has a different purpose: it aims essentially at allowing disambiguation, both grammatically and semantically, because ambiguity is a fundamental and very common problem in this particular context. Such a typology essentially focuses on the location of word types, on the structures encountered in the sentence. This explains why typologies can be different, as they have different goals and purposes.

Analyzing relative pronouns

What is the status of ‘relative pronouns’ of classical grammar within the present conceptual framework? Traditionally, a distinction is made between simple relative pronouns (qui, que, dont, où ; who, what, whose, where) and compound relative pronouns (à qui, pour lesquelles, à côté duquel, etc.; to whom, for whom, beside whom, etc.). If we look first at simple relative pronouns, the category does not seem satisfactory, in particular because of the presence of ‘qui’ (who) and ‘que’ (what), whose grammatical role appears, in the present context, to be quite different. Consider the two short sentences: ‘la maison que j’habite est grande’; et ‘l’homme qui parle est grand’. (the house I live in is big and the man who speaks is tall.). As these two examples illustrate, the structures following ‘que’ and ‘qui’ appear different. Here, ‘que’ is followed by a personal pronoun (‘j’habite’: I live) and a conjugated verb; and ‘qui’ is followed directly by a conjugated verb (‘parle’: speaks). From our present perspective, these are inherently different structures. Here, it turns out that ‘dont’ and ‘où’ admit the same type of structure as ‘que’. Thus, the homogeneous category, from our point of view, is formed here by ‘que’, ‘dont’, ‘où’, but not by ‘qui’. If we extend this analysis to other words, by searching for those who could fit into this category, we also find: ‘duquel’ (= de lequel; from which), ‘de laquelle’, ‘desquels’ (= de lesquels; from which), ‘desquelles’ (= de lesquelles; from which), ‘auquel’ (à lequel), à laquelle, ‘auxquels’ (à lesquels), ‘auxquelles’ (à lesquelles). But we also have all forms of the same type built from another preposition than ‘de’ or ‘à’: ‘sur lequel’, ‘sur laquelle’, …, ‘par lequel’, ‘par laquelle’, ‘avec lequel’, etc. Les pronoms relatifs composés classiques tels que ‘à qui’, ‘pour lesquelles’, ‘à côté duquel’, etc.; to whom, for whom, beside whom, etc.), s’intègrent également naturellement dans cette catégorie. But from the point of view of two-sided grammar, ‘à l’aide duquel’, ‘au moyen de laquelle’, ‘à la suite de quoi’, ‘à l’aide de qui’, etc. (with the help of which, by means of which, as a result of which, with the help of whom, etc.) also belong to this category. (to be continued)

Powering MT with two-sided grammar: the case of ‘près de’

‘près de’ (near) is considered to be a prepositive locution. From the viewpoint of two-sided grammar, it is (synthetically) a preposition, made up (analytically) of an adverb (‘près’) followed by the preposition ‘de’. In Corsican language, this is translated as vicinu à. But this grammatical analysis does not solve all cases, as the example above shows. Because in the sentence ‘depuis près de dix ans, il travaillait’ (for almost ten years, he has been working), ‘près de’ (almost; guasgi) has a different grammatical role. According to classical analysis, it would rather be an adverb.
In the present conceptual framework, we will analyze ‘près de’ (almost; guasgi) in ‘depuis près de dix ans, il travaillait’ (for almost ten years, he has been working) as a modulator of the cardinal determinant ‘dix’ (ten), i.e. as a modulator of cardinal determinant. A prototype implemented with this type of grammatical analysis then gives the correct translation, where ‘near’ is replaced by guasgi (nearly) . It seems that two-sided grammar is beginning to produce interesting results (to be confirmed).

Expanding on noun modulators

Let’s take a closer look at noun modulators, especially common noun modulators. We have seen that adjectives could be considered, in the present conceptual framework, as noun modulators. In this context, the question arises, are there other forms of noun modulators? It seems that there are.

Let us consider elements of sentences such as ‘bois de châtaignier’ (chestnut wood; legnu castagninu) or ‘oiseau de proie’ (bird of prey; aceddu di preda). In ‘bois de châtaignier’, ‘de châtaignier’ seems to play the role of noun modulator, in the same way as an adjective. In traditional grammar, ‘de châtaignier’ is considered as a noun complement. In the present framework, it would be a noun modulator, since it clarifies and restricts the meaning of the noun ‘bois’ (wood; legnu). The role of ‘de proie’ in ‘oiseau de proie’ is identical, as it acts as a modulator of the name ‘bird’.

Interestingly, it turns out that the comparison between languages tends to validate this type of analysis. Indeed, ‘bois de châtaignier’ is better translated in Corsican language by legnu castagninu than litterally by legnu di castagnu (chestnut wood); and in this case, castagninu (of chestnut) is an adjective, i.e. a noun modulator. Thus, castagninu and di castagnu being equivalent here, confirming in both cases their same nature of adjective modulator.

Modulators: the case of adjectives

Using the notion of modulator again, we can now insert adjectives into this framework: in this context, they consist of noun modulators (mostly common nouns, but sometimes proper nouns as well). The adjective, as a noun modulator, is placed either before or after the noun.

So we have the following categories:

  • modulators of nouns (= adjectives)
  • modulators of adjectives
  • modulators of verbs, i.e. adverbs in a restrictive but classical sense
  • modulators of determinants
New: Part-of-speech tagger for French language API

I have just published the POS-tagger for French language API, on RapidAPI. The use of the API is free for 1000 requests / month. No training necessary, it works immediately.

