This site has been retired. For up to date information, see handbook.gnome.org or gitlab.gnome.org.


[Home] [TitleIndex] [WordIndex

Introduction

This page is about inputting texts from keyboards or other input devices like microphones or Wacoms.

We type every day. (Yes. I am typing using a keyboard to create this Wiki article.) But such a "simple" activitiy can be very hard to understand because our GNOME community is international. There is a large variety of human languages, writing systems and dialects. Even people using the same language prefer different methods of inputting.

This article introduces how people input in their languages. GNOME international. A programmer who does not use your language may maintain your inputting tools. Please be nice and patient to help them understand how your language is inputted.

We assume the audience knows only English which does not require any specialized software to get their letters inputted. When editing, please keep in mind that anything you take for granted may be a surprise for the people speaking a different language. English users may not know interactive inputting which CJK people use. Chinese users may know nothing about keyboard layouts (Yes. Chinese people uses plain U.S. English QUERTY keyboards. Don't expect average Chinese people to know the Dvorak keyboard or how to input a letter with dots ä or an upside-down question mark ¿). There is also kana-to-kanji conversion, an issue which only makes sense for Japanese.

Keyboard Layout

(TODO:

)

According to Wikipedia (http://en.wikipedia.org/wiki/Keyboard_layout), a keyboard layout is any specific mechanical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer, typewriter, or other typographic keyboard.

In GNOME, we care about keyboard layouts because GNOME is international. People need support for the keyboard layout of their respective languages.

The bépo layout is an example of a layout designed with internationality concerns in mind. Although primarily designed to type french, it allows to type nearly any latin-using language (including vietnames) by the use of dead keys for accents. Practice shows that, although possible, it is not convenient and only sufficient for occasional typing.

List of keyboard layouts

Keyboard layout Different languages may need different keyboard layouts.

QWERTY is not the only keyboard layout available.

Dvorak is an optimized layout for English. See: http://en.wikipedia.org/wiki/Dvorak_Simplified_Keyboard

European languages like German requires specialized keyboard layout for inputting letters with accents.

(TODO: Mention the shape and the special characters on non-English keyboards. )

TODO: List of keyboard Layouts.

Examples

(TODO: Demonstrate how to input a sentence. It includes:

Most western european users uses only one keyboard layout at a time. Some keyboard layouts do not permit to type all characters used by the language (for example, french azerty misses a lot of french characters, such as «,»,É,À,–), for which users rely on word-processing auto-correct. An input method engine should provide a way to type those characters in a convenient way.

Switching layouts is not something western-european users are used to and you would expect from the average western european user. Instead, most users typing in another latin character language will rely to compose keys if available, or simply use common substitution which can result in degradation (ñ → n, é → e, ß → ss, ¡ disappearing or replaced by !, č → cx), although keeping the text readable.

In French, inputting a phrase with characters that cannot be typed is usually done like this (for example, to type "« À l’abri du vent »":

This method works fine for typing text in a word processor, but conflicts heavily with typing shell commands or code, for example, since you don’t want substitution to occur. Input method engine doing such substitution must then have a knowledge of what kind of text the user is typing.

(TODO: Please emphasize the features that an input method engine (or related softwares) must provide to make inputting such languages possible and convenient.)

Interactive Inputting

By "interactive", it means a characters is not immediately yielded after hitting a key. Instead the program tries to show the user a list of candidate characters from which the user choose from. It is "interactive" since the user must make decision from the information the software feeds back to the user.

This example shows the basic ideas.

Languages like Chinese have far too many distinct characters (10000+). Since we cannot put all characters onto the keyboard (actually our ancestors did, but that was a very large keyboard), we use interactive inputting instead.

Interactive inputting works more or less like this:

  1. First the user inputs a sequence of Latin letters, which encodes a phrase/sentence of such languages.
  2. Then the user commands the input method (by pressing the space key) to convert the sequence.

  3. The input method gives the user a list of possible conversions.
  4. The user chooses the correct conversion (by pressing a number key like 1, 2, 3, ...).
  5. Then the user proceeds to input the next phrase/sentence.

Intelligent prediction

Phonetic notations (Pinyin/Bopomofo for Chinese, Romaji/Kana for Japanese) are not 1-to-1 mappings because there are many characters sharing the same pronounciation and, thus, the same "encoded sequence" which the user actually types from the keyboard.

Input methods use intelligent algorithms to "guess" the most probable conversion results to reduce the times to "pick a candidate". Such algorithms may resort to natural language processing and machine learning technologies. Models like hidden Markov model are applicable in intelligent

Search engine providers are strong with NLP and ML. Sogou, Google, Microsoft, Tencent and Baidu, which are all search engine providers, all "happen" to be providing input method softwares.

Examples

By language

There is already a page for CJK languages. Consider migrating relevant pages to make CJK languages un-special.

Stylish

Smiley and Emoji

TODO

Martian Chinese

Martian Chinese is the counterpart of the Leet-1337 convertion in English. Martian Chinese replaces each character with a similar-looking character to make the sentence unreadable. Unlike Leet-1337, Martian Chinese users appear stupid to some other people, but the users themselves may think they are stylish.

Unlike Leet-1337, Martian Chinese must be supported by an input method software.

TODO: Give examples.


2024-10-23 10:59