New open test

Performing now a new open test, with the first 100 (more or less) words of the ‘article of the day’ from French wikipedia: we get 94,02% = 1 – (8 / 134). Several errors (5) result from lack of vocabulary. There are also some grammatical errors (da instead of par o , in diducendu instead of diducendu ni) and lastly, the diambiguation of polaccu (Polish) which is erroneous. The disambiguation of ‘partie’ is correct since it can be translated into parti (part) or into partita (gone, party).

Iterated open tests that that there is an average 50% of errors that result from lack of vocabulary. This type of error should be easy to tackle, inasmuch as it does not concerns rare words. Reasonably, a target of 96% or 97% should be attainable on this basis.

The test is ‘open’ in the sense that it can be verified here.

Posted in blog | Comments Off on New open test

Minor breakthrough

It is kind of a minor breakthrough. The translation of French ‘en même temps que’ (at the same time as) is somewhat hard, in that it can take two different forms: either à tempu à or à tempu ch’è, depending on the context. The above examples tackle this sort of difficulty (although not exhaustively).

Posted in blog | Comments Off on Minor breakthrough

Priority pairs regarding endangered languages

There exists priority translation pairs, from the standpoint of endangered languages. Such notion of a priority pair (the most useful pair for the current users of the endangered language), regarding a given endangered language. For example, French to Corsican is a priority pair, with respect to other pairs suchas Gallurese-Corsican, English-Corsican or Spanish-Corsican. In this context, any endangered language has its own priority pair. For example, a priority pair for sardinian gallurese is Italian-Gallurese. In the same way, a priority pair for sardinian sassarese is Italian-Sassarese. In an analogous way, a priority pair for sicilian language is Italian-Sicilian.

Posted in blog | Tagged | Comments Off on Priority pairs regarding endangered languages

Should machine translation software go open source

There is an ongoing debate on whether AI software should go open source or not (for example Bostrom’s paper Strategic Implications of Openness in AI Development). Now our current concern is of whether MT software should go open source or not. Prima facie, for safety reasons, it would be better to render public MT code, thus allowing anyone to check the code and find eventual errors, … Such openness would notably be a defense against the AI control problem , in short, the fact that superintelligence could harm humans. From this standpoint, it seems that publicness of code is much better than privateness. Regarding rule-based translation (the distinction between statistical and rule-based MT is not as clear-cut as one could think at first glance, since some rules could be applied on a statistical basis), it would allow people to check step-by-step the resulting translation. It seems better transparency should be attained accordingly.

Illustration from pixabay.com

Another advantage or publishing the code would be to allow anyone to improve it and extend its capabilities, notably by adding new modules targeted at new languages (human languages’ count being around 7000).

Posted in blog | Comments Off on Should machine translation software go open source

Some thoughts on the remaining 1% problem

To begin with, let us state the 1% problem, for machine translation: it seems some 99% accuracy in machine translation could be attainable but the remaining 1% (1% is just a given number, somewhat arbitrarily chosen, but useful to to fix ideas) may be hard of even very hard to reach. Now a question arises: is some progress on the remaining 1% problem attainable without general-purpose AI. Prima facie, the answer is no. For it seems that progress on the remaining 1% problem requires, for example, some abilities such as being able to find the translation of a given word on external databases. For it will occur sometimes that the 1% untranslated will be due to the presence of a new word, for instance very recently created, and thus lacking in the MT internal dictionary. In order to find the relevant translated word, the machine should be able to search and find it on external databases (say, the web), just as a human would do. So, solving the remaining 1% problem requires – among other capabilities – any such ability which is part of a general-purpose AI.

Illustration from Pixabay.com

Artificial general intelligence (AGI) is prima facie a somewhat abstract notion, that needs to be refined and made more explicit. Problems encountered in implementing machine translation systems can help make this notion more accurate and concrete. The ability to find the translation of a given word on external databases is just one of the required abilities needed to solve the remaining 1% problem. So we shall mention some other abilities of the same type later.

Posted in blog | Leave a comment

Just powered the new grammatical engine

Just powered the new grammatical engine: it seems to allow for some interesting things, notably related to expressions. The case at hand is the French expression “parler comme un moulin”  (talk the leg off a chair) that translates into Corsican as dì quant’è sette. To be continued…

Posted in blog | Tagged | Comments Off on Just powered the new grammatical engine

Is rule-based MT more ethical than statistical MT?

In the ongoing debate on safe IA, it is a relevant open question of whether rule-based MT is more ethical than statistical MT. Here are some arguments in favor of rule-based MT in this context (without blaming statistical MT which has its own strengths):

  • it emulates human reasoning: it translates a text just as a human would do
  • there is much control on rule-based MT since the resulting translated text can be traced back: a detailed step-by-step translation process can be provided if required
  • rule-based MT can be consistently part of and integrate itself into a whole project of brain emulation, which emulates general human reasoning

 

Posted in blog | Tagged , , , , , | Comments Off on Is rule-based MT more ethical than statistical MT?

Open test – october 18 , 2018

Performing now an open test (now fully ‘open’ since it can be verified via http://okchakko.com/translate), from the 100 first words of the daily article of French wikipedia. There is one full sentence, but some remaining errors:

  • dopu à quattordici anni
  • brama di crià: the translation of French verb “souhaiter” is ambiguous, since it can mean either augurà (to wish), or bramà (to hope for), depending on the context
  • induv’eddu figura

Scoring: 1 – (7/105) = 93,33%.

Posted in blog | Leave a comment