Internationalization i18n and Localization i10n concepts

Terminology of Internationalization ¶


ISO
International Organization for Standardization
W3C
World Wide Web Consortium
AE
American English
BE
British English
i10n - Localization (AE) or Localisation (BE)
Adaptation of software or product to meet language and culture of a particular target market ou audience. Localization is not only language translation, but also cultural adaption, inclunding, country currency, parper size ISO A4 or Letter sizer, date time format, cultural conventions, legal requirements.
i18n - Internationalization (AE), Internationalisation (BE)
Process of building a software application capable of adapting to multiple languages and cultures without code changing. In other words, a internationalization is the process of building a software with localization infrastructure, that allows adding new localization without code changes.
g11n - Globalization
Combination of internationalization and localization. This terminology is used by IBM and Oracle.
Locale
Language ISO code for instance, en-US (American English), en-UK (British English), fr-CA (Canadian French) and so on.
CJK Languages
Chinese, Japanese and Korean languages
RTL Languages
Right-To-Left Languages. Languages written from right to left. Examples: most languages using Latin Script, including, English, Greek and so on.
LTR Languages
Left-To-Right Languages. Languages written from left to right. Example: Hebrew, Arabic, Persian, Urdu, Phoenician and so on.
NVL
National Language Version
ICU
International Components for Unicode
ICU4J
Java implementatijon of the ICU library.
CLDR
Common Locale Data Repository (from ICU project)
LMDL
Locale Data Markup Language (XML-based language - ICU project).
UTF-8
Unicode 8 bits. One "character" or symbol may use more than one byte. Unicode 8 bits is the most used text enconding format possibly due to the backward compatibility with the old ASCII text encoding. However, it makes programming harder since one cannot assume that i-th position of a unicode string corresponds to the i-th character because a single UTF-8 "character" may be represented by multiple bytes.
UTF-16
Unicode 16 bits. One "character" or symbol is represented by 2 bytes. The UTF-16 encoding is mostly used internally by programming languages implementations, including Java and Python. The Windows C API - Application Programming Interface also uses UTF-16 in its internal APIs. This text enconding format is easier to deal with in programming langues because the i-th position of a unicode array is the i-th symbol.
QA Testing
EN - Quality Assurance Testing
PT - Teste de Garantia de Qualidade

Internationalization Issues ¶


World Regional Languages

1: World Regional Languages

Brainstorm of Major Internationalization Issues

Common Mistakes ¶


Common Locales ¶


Locale codes follow the convention <language-code>-<countery-code> with (-) dash character or <language-code>_<country-code> with the underline character (_).

Country Locale Code Language Name
USA en-US American/USA English
UK en-UK British English (UK - United Kingdom)
Ireland en-IE Irish English
Ireland ga-IE Irish language (Irish Gaelic)
Canada en-CA Canadian English
Canada fr-CA Canadian French (Français Canadien)
Australia en-AU Australian English
New Zealand en-NZ New Zealand English
Singapore en-SG Singaporean English
Hong Kong en-HK Hong Kong English
South Africa en-ZA South African English
South Africa af-ZA Afrikaans (based on Dutch language)
Phillipines en-PH Phillipines English (based on American English)
India en-IN English (India)
India hi-IN Hindi[1] (India)
India ta-IN Tamil (India)
Germany de-DE German (Deutsch)
Austria de-AT Austrian German (Österreichisches Deutsch)
Switzerland de-CH Switzerland German (Schweizerdeutsch)
Switzerland fr-CH Switzerland French (Suisse français)
Switzerland it-CH Switzerland Italian
France fr-FR French (Français)
Italy it-IT Italian (Italiano)
Greece el-GR Modern Greek
TĂŒrkiye tr-TR Turkish[2]
Cyprus el-CY Modern Greek of Cyprus
Cyprus tr-CY Turkish language (Cyprus)
Spain es-ES Spanish (Español)
Spain ca-ES Catalan (CatalĂĄn in Spanish)
Spain eu-ES Basque (Non indo-european language)
Spain gl-ES Galician (Sister language of Portuguese)
Mexico es-MX Mexican Spanish (Español mexicano)
USA es-US American/USA Spanish
Puerto Rico es-PR Puerto Rico Spanish (USA)
Argentina es-AR Argentinian Spanish (Español argentino)
Uruguay es-UY Uruguayian Spanish (Español uruguayo)
Chile es-CL Chilean Spanish (Español chileno)
Colombia es-CO Colombian Spanish (Español colombiano)
Peru es-PE Peruvian Spanish (Español peruano)
Ecuador es-EC Ecuadorian Spanish (Español ecuatoriano)
Panama es-PA Panamenian Spanish (Español panameño)
Venezuela es-VE Venezuelan Spanish (Español venezolano)
Portugal pt-PT European Portuguese (PortuguĂȘs Europeu)
Angolar pt-AO Portuguese (Angola)
Capte Verde pt-CV Portuguese (PortuguĂȘs de Cabo Verde)
Brazil pt-BR Brazilian Portuguese (PortuguĂȘs Brasileiro)
Japan ja-JP Japanese language
Singapore zh-SG Chinese (Singapore)
Taiwan zh-TW Taiwan Chinese
Hong Kong zh-HK Hong Kong Chinese
China zh-CN Chinese (Mandarin Chinese of Mainland China)

NOTE:

  1. Most English variants around the world are based on the British English and uses the British spelling. The American English spelling is only used by USA and Phillipines.
  2. India does not have any official language and Hindi is not the official language of India. Moreover, the majority of Indian population does not speak Hindi.
  3. Hong Kong is not country. It is a SAR - Special Administrative Region of mainland China. Hong Kong has its own currency and onlympic team. In addition, in sports matches Hong Kong uses its own flag.
  4. Puerto Rico is not a country. The island is USA non incorporated territory, even though the island has its own olympic team.
  5. Spanish locales don't have much difference other than country code, currency and paper size since most Spanish countries follows the Royal Spanish Academy[3]

Change User Interface Language on Linux

It is possible to change the UI interface language of some application on Linux on command line by setting the environment variable LANG to the desired locale code. The default system locale on Linux can be obtained by reading the environment variable $LANG.

In bash shell or any other POSIX shell.

$ echo $LANG
en_US.UTF-8

In Python,

>>> import os 

>>> os.getenv("LANG", "")
'en_US.UTF-8'

The following command temporarily changes the Kwrite KDE text editor language to Swiss German even if the default language used during the Linux distribution installation was not German. This feature is useful for learning new vocabulary of other languages.

env LANG=de_CH kwrite

Lanch kwite with language set to Swiss German detached from terminal (without blocking the terminal emulator).

$ env LANG=de_CH.UTF-8 kwrite 1> /dev/null 2> /dev/null & disown

Explanation:

Screenshot of KWrite using different locales

KWrite started with en_US American English locale

2: KWrite started with en_US American English locale

KWrite started with de_DE German locale for Switzerland

3: KWrite started with de_DE German locale for Switzerland

KWrite started with es_ES Spanish locale for Spain

4: KWrite started with es_ES Spanish locale for Spain

See

Falsehoods Many Programmers Believe About Names ¶


  1. Names are only written using ascii characters. Counterexample: "João" (portuguese version of John) or " Björk" (Icelandic given name).
  2. Names does not contain hyphen (-) or apostrophe (') characters. Counterexample: O'neil - common irish surname.
  3. A person may have only two names, a given name and surname (family name). Counterexample: the full name of Brazil's emperor Pedro II of Brazil was "Pedro de AlcĂąntara JoĂŁo Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula LeocĂĄdio Miguel Gabriel Rafael".
  4. People do not change their names, surnames or email.
  5. People will never have identical names.
  6. The name or surname has at least 3 characters. Counterexample: "Wu".
  7. The full name is limited to 100 characters length.
  8. I will never need to deal with foreign names in my database.
  9. Names with the same spelling are always written in the same way with the same spelling. Counterexample: There are several different Japanese names that sounds as "Akira" and are romanized (written in latin script) as that, although they are written using different Kanji symbols and have different meanings.
  10. People have at least one surname. Countereraxmple: In some countries, such as Indonesia and Japan some people may have a single given name and no surname. Members of Japanese royalty do not have surnames or family names. Sukarno, the first president of Indonesia did not have any surname. His full name was just "Sukarno".
  11. People only a have a single given name and there is no whitespace within a given name. Counterexample: Hector MarÎč́a GONZALEZ LÓPEZ. The given name is "Hector MarÎč́a". The patronymic surname is Gonzalez and the mather's family name is LĂłpez. Many Spanish given names have the suffix Maria in compound given names. Female hispanic given often have the suffix "de dolores", "soledad" and etc.

See also:

American English Vs British English ¶


American English British English
internalization internationalisation
localization localisation
meter metre
meters metres
program programme
center centre
color colour
favor favour
favorite favourite
labor labour
defense defence
ofense ofence
shop shoppe
shopping mall, mall shopping centre
tires tyres
while while, whilst
football American football
soccer football
roommates roommates, flatmates
fall autumn
truck lorry
truck shorthand for pickup truck

Software Libraries ¶


JavaScript

Python

Footnotes ¶


[1] Hindustani

[2] The country is formely known as Turkey.

[3] Spanish: Real Academia Española

See also ¶


Unicode ¶

Internationalization and Localization Reading ¶

Numbers and Mesurement ¶