Thursday, March 05, 2009

Ubiquity in Portugal's Portuguese

I've been keeping up with some of the stirring around Ubiquity development, and one aspect that caught my attention was the adaptation of that amazing interpreter to multiple languages. I've commented before on Aza Raskin's blog about the implications of using Ubiquity in native portuguese. Mozilla's Mitcho Erlewine and Felipe Gomes have brought up the issue again. This time things appear to be moving on in Mozilla Labs for Portuguese language in Ubiquity, which is great! But please, hear me out first for a few small — but relevant — points that need to be considered. In the remainder of this post I'd like to clear them out. Note that these will be personal opinions, and I'm not a linguistics expert, only an interested software engineer.

Felipe's entire post is clear and understandable. But his ideas work for the Brazilian variant of the Portuguese language. Being a native Portuguese (and while not having any kind of grudge against non-native Portuguese), it's somehow odd (even repulsive, pardon me!) to read some of the Brazilian syntax — it's just looks badly written. One simple example is the "gimme a map" command, that Felipe translates to Brazilian as "me dê um mapa", and native Portuguese would translate it as "dê-me um mapa". But this simple example presents other cultural divergences, like the friendly/respectful tone. The English expression "give me" can be translated into "dá-me" or "dê-me" (friendly/respectful, in that order). And the same goes to the command verb "map", that would be translated into "mapeie/mapeia".

Felipe's offers the great suggestion of detecting the word's root, e.g., "map" in "mapeie". And this will work for almost every verb. Except some crucial ones, that are irregular! (I honestly hate grammar irregularities...). Take the verb "to go" (e.g., can be used in a command to go to a specific page/tab/section/<somethingelse>). So one could say "go to gmail". In native portuguese, Felipe's suggestion fails in that the same verb translates into "ir", and is used in a commands such as "vai para o gmail", or "vá para o gmail". Note that not only these commands cannot be easily traced into their infinitive form ("ir"), but both possible tones are different (the accented "a" in "vá" is very much relevant).

I'm really sorry to be a spoil-sports, but this is won't work so simply. Obviously, the set of irregular verbs are finite, and we could add the correct usages to a syntactic rules table. I expect the final code to be optimizable into a feasible speed of interpretation. But I'm not so sure that this rules can be, as Mitcho so desires, reduced to a common denominator to all languages (again, don't get me wrong, I'd love that to be possible, I just don't have a useful knowledge-base of other languages to believe that to be feasible).

And this is why I'm so fond of using some webapps' names as commands, even in portuguese. "twitter <message>", "google <query>" or "wikipedia <expression>" will be understood perfectly in Portugal (and Brazil, for that matter). I know, Ubiquity's future is about natural language interpretation, whatever idiom you may prefer. And that's really something special. But I can't stop but to feel that every language is either going to require a complex interpreter suited for itself, or suffer from aggravated syntactic-sugar syndrome.

If any point I've covered can be countered somehow, just consider, in following blog posts/projects/etc to remember to identify what Portuguese idiom are you referring to when you discuss grammar rules. Again, it's not a matter of any kind of discrimination, but one of correctness, and everybody deserves that.

Thanks for reading!