Why rule-based translation is (presently) best suited to endangered languages

  • Here are some arguments in favor of the choice of rule-based translation concerning machine translation of endangered languages (it relates to the philosophy of language policy):
  • there does not exist at present time a reliable corpus between the given endangered language and other languages
  • endangered languages are often polynomic, i.e. there exist some main variants of the language that coexist: it is important to preserve them since (i) it is a feature of diversity and (ii) it is an inherent feature of the given endangered language, and to distinguish between these variants. In addition, any translation should not contain a mix up of these variants. This also complicates the process of building a proper corpus, since the scarce existing corpus is made up of different variants of the language.
  • in the lack of an adequate corpus, statistical machine translation is not able to provide quality translation of the given endangered language (while on the other hand it succeeds with common languages where excellent corpora are available): arguably, providing low quality translation (although the attempt is meritable) could harm these endangered languages that are by definition vulnerable, since people could use and diffuse the resulting low quality translation. On those grounds, given this vulnerability, it could be argued that a minimum 80% quality translation is needed for a given pair involving an endangered language.
  • in addition, it should be pointed out that endangered languages are usually in a ‘diglossic’ relationship with another language: what is needed as a matter of priority is to provide translation between the two languages of this pair

(to be continued)

