Tips for translating California's "Quick Add" words

Background

California is a GNOME 3 calendar currently under development. One of its features is "Quick Add" where the user can simply type in the various details of an event in a natural language. California will parse the words into a formal calendar event and add it to their calendar. California does this with a crude natural-language parser, a parser more advanced than the original ADVENTURE parser but not as good as, say, Infocom's parser from the 1980s.

However, in order for this feature to be available to the GNOME community at large, it has to work in a variety of languages. That means the parser does not deal directly in English words or even English grammar, but something more opaque and generic. This (and, of course, general limits of available manpower) have led to the parser to be more on the crude side than advanced.

Translation tips

As a translator, be aware of the following issues when translating California's PO file, in particular for lists of words marked as "for Quick Add":

  • The parser operates by scanning the sentence from start to finish, tokenizing it into words delimited by whitespace.
  • Some words are parsed directly. For example, "today" translates directly to today's date. It also understands "tomorrow" and "yesterday".
  • It can also parse time-of-day words if it sees the appropriate formatting. For example "9p", "9pm", "9:00pm" and "21:00" are all parsed to the same time. (The parser does not worry if the user is configured for 12-hour vs. 24-hour time, it handles all those cases.)

Prepositions

  • The parser currently looks for four (4) types of prepositions: TIME, LOCATION, DURATION, and DELAY. These words do not translate into event detail, but they signify that the next word(s) are to be translated in a particular manner:
    • TIME indicates a moment of time. In English: at 9pm, from 3:00 to 4:00, on May 9.

    • DURATION indicates a span of time. In English: for 1 hour, for 30 minutes.

    • DELAY indicates a moment of time after a duration of time has passed. In English: in 1 hour, in 30 minutes.

    • LOCATION indicates a place. In English: at Disneyland, at the gym.

  • These prepositions are listed in the PO file with semi-colon delimiters, i.e. "at;from;to;on;". The translation for these words should be listed similarly (i.e. with semi-colons separating them and no whitespace).

For example, if I was producing a (fictional) Martian translation of DELAY (which is "in;" in English), but there were three words signifying delay in Martian ("grz", "regrz", and "tyb"), my translation would be "grz;regrz;tyb".

NOTE: If your language requires multiple words to signify a preposition (i.e. "re tyb" in our fake Martian), please file a ticket in the issue tracker. The parser currently can't handle this situation.

  • You do not need to translate each English preposition one-for-one. Since these words are merely signifiers to the parser, you simply should provide a similar set of words in your own language. You can provide fewer or more words, whatever makes sense. This also applies to gendered languages; if a word has two or more forms, simply list all of them.

For example, in my fake Martian translation of "at;from;to;on;" (which are TIME prepositions), I know that the Martian words indicating that I'm speaking about a time of day include "ixl", "frognard", and "bx". In addition, "frognard" has variations to include all three Martian genders: "frognardyl" and "nardfrog". So my translated string would be "ixl;frognard;frognardyl;nardfrog;bx". Note that there are more words in Martian than English and that some of these words don't map to English directly. However, they all signify to the parser that the next word is probably going to be a time of day.

Probably is the key word in that last sentence; California is simply doing the best it can to produce an event. The user can edit the event later if they don't get exactly what they want. Which leads to this point:

  • Don't worry about perfect grammar. California's parser is simply doing its best to fill-in-the-blanks for the user. For example, as pointed out here, in Brazilian "at" (when used for time) is translated to "à" for 1am/1pm and "às" for other hour numbers. The parser isn't worried about perfect grammar, these words are merely signifiers to assist intepretation of data. So, the Brazilian translation should list both "'à;às;". That means the parser will accept "às 1pm" to mean "the event starts at 1pm". This is grammatically incorrect but, at the end of the process, probably matches what the user was asking for when they typed that sentence.

  • Some words cannot be repeated with other prepositions. Words in LOCATION may be listed in the other three (TIME, DURATION, or DELAY), but words may not be repeated between TIME, DURATION, or DELAY.

For example, in English, "at" can mean a point in time ("at 9pm") or a location ("at Disneyland"). The parser can distinguish between the two (if it looks like a time, it's treated as time, otherwise it's a location), so it's acceptable to have "at" in LOCATION and TIME. However, in our fictional Martian language "prz" can indicate both a moment in time ("prz 9am") and a duration ("prz 1 hour"). In this case, the Martian translator needs to pick which way "prz" is more commonly used. "prz" should not be listed in TIME and DURATION, just one of them.

Ordinals

The parser also deals with ordinals, such as 1st and 2nd in English. The parser doesn't care if the ordinals are correct (i.e. "1nd" is nonsense in English) but merely strips the suffix from the word and converts the number. Translators should merely list all recognized numeric suffixes for their language without a numeral attached.

For example, some languages use a specific ordinal indicator, i.e. "1º". If that's true for your language, simply list that: "º;"

More help

If you have more questions or discover other issues with translating words for Quick Add, try one of the following:

Apps/California/TranslatingQuickAdd (last edited 2019-01-13 12:45:15 by AndreKlapper)