Internationalization i18n and Localization i10n concepts

$$ $$

Terminology of Internationalization ¶

ISO: International Organization for Standardization
W3C: World Wide Web Consortium
AE: American English
BE: British English
i10n - Localization (AE) or Localisation (BE): Adaptation of software or product to meet language and culture of a particular target market ou audience. Localization is not only language translation, but also cultural adaption, inclunding, country currency, parper size ISO A4 or Letter sizer, date time format, cultural conventions, legal requirements.
i18n - Internationalization (AE), Internationalisation (BE): Process of building a software application capable of adapting to multiple languages and cultures without code changing. In other words, a internationalization is the process of building a software with localization infrastructure, that allows adding new localization without code changes.
g11n - Globalization: Combination of internationalization and localization. This terminology is used by IBM and Oracle.
Locale: Language ISO code for instance, en-US (American English), en-UK (British English), fr-CA (Canadian French) and so on.
CJK Languages: Chinese, Japanese and Korean languages
RTL Languages: Right-To-Left Languages. Languages written from right to left. Examples: most languages using Latin Script, including, English, Greek and so on.
LTR Languages: Left-To-Right Languages. Languages written from left to right. Example: Hebrew, Arabic, Persian, Urdu, Phoenician and so on.
NVL: National Language Version
ICU: International Components for Unicode
ICU4J: Java implementatijon of the ICU library.
CLDR: Common Locale Data Repository (from ICU project)
LMDL: Locale Data Markup Language (XML-based language - ICU project).
UTF-8: Unicode 8 bits. One "character" or symbol may use more than one byte. Unicode 8 bits is the most used text enconding format possibly due to the backward compatibility with the old ASCII text encoding. However, it makes programming harder since one cannot assume that i-th position of a unicode string corresponds to the i-th character because a single UTF-8 "character" may be represented by multiple bytes.
UTF-16: Unicode 16 bits. One "character" or symbol is represented by 2 bytes. The UTF-16 encoding is mostly used internally by programming languages implementations, including Java and Python. The Windows C API - Application Programming Interface also uses UTF-16 in its internal APIs. This text enconding format is easier to deal with in programming langues because the i-th position of a unicode array is the i-th symbol.
QA Testing: EN - Quality Assurance Testing; PT - Teste de Garantia de Qualidade

Internationalization Issues ¶

Figure 1: World Regional Languages

Brainstorm of Major Internationalization Issues

Initial Language
- The website or application attempts to guess the user language based on http header or IP address.
- Presents the web page in default language and provides a menu or form allowing the user to switch the language.
Language Switching (Processo de mundança de línguagem)
Optimization
- Localization strings (text) served or renderend in the front-end (client side).
- Localization strigns (text) served or rendered in the backend (server side).
- Lazy loading locatization strings.
- Load only what is needed.
Language Issues
- Date Format
- Pluralization
- Language Dialects
- Gender-Specific Translations
- A country may have more than one language or more than one official language. For instance, in Canada, English and French are official languages and in Switzerland, German, French and Italian are official languages.
- Spelling of Specific dialect of a language (Ortografia de dialetos especificos de uma linguagem)
- RTL - Right-To-Left Language Support (Suporte a Línguagens da Esquerda para direita), for instance Arabic and Hebrew are written from right to left.
Cultural Issues
- Cultural Standards
- Cultural Conventions
- Cultural Assumptions

Common Mistakes ¶

Country Flag does not represent a language. For instance, the Brazilian flag should not be used for representing the Portuguese language.
There are more than one dialect and spelling of the same language. For instance, some words in American English and British English have different spellings, i.e internationalization (American English) and internationalisation (British English). That is the reason why country flags should not be used for representing languages.
Localization must not only specific for a particular language. It should also be specific for each country and standardized dialect of a language, such as American English, British English, European Portuguese or Brazilian Portuguese. For instance, despite the American English be ubiuitous on the internet, most English speaking countries other than USA or Phillipines use British English spelling and languages contructs derived from this English dialect. The British English spelling is also widely used on continental Europe by non English speaking countries, including Germany, France, Portugal, Netherlands and so on.
Do not use IP address or user geographic location for selecting the language or locale used by a website. The language should be selected based on the user preference indicated by the http header Accept-Language and the graphicsuser interface should always have a button or selection box allowing the user to switch the language. On Linux Desktop applications, the environment variable LANG is commonly used for obtaining the user desired language. This variable is set during Linux installation of Linux desktop distributions to the language chosen by the user.
Some reasons to avoid using IP address, geographic location or country for choosing the UI - User Interface language are: some countries may have multiple official languages or multiple spoken languages; nowadays, people travels and may not be able speak the local language; even a native speakers may not want read in his or her own native language.

Common Locales ¶

Locale codes follow the convention <language-code>-<countery-code> with (-) dash character or <language-code>_<country-code> with the underline character (_).

Country	Locale Code	Language Name
USA	en-US	American/USA English
UK	en-UK	British English (UK - United Kingdom)
Ireland	en-IE	Irish English
Ireland	ga-IE	Irish language (Irish Gaelic)
Canada	en-CA	Canadian English
Canada	fr-CA	Canadian French (Français Canadien)
Australia	en-AU	Australian English
New Zealand	en-NZ	New Zealand English
Singapore	en-SG	Singaporean English
Hong Kong	en-HK	Hong Kong English
South Africa	en-ZA	South African English
South Africa	af-ZA	Afrikaans (based on Dutch language)
Phillipines	en-PH	Phillipines English (based on American English)
India	en-IN	English (India)
India	hi-IN	Hindi^[1] (India)
India	ta-IN	Tamil (India)
Germany	de-DE	German (Deutsch)
Austria	de-AT	Austrian German (Österreichisches Deutsch)
Switzerland	de-CH	Switzerland German (Schweizerdeutsch)
Switzerland	fr-CH	Switzerland French (Suisse français)
Switzerland	it-CH	Switzerland Italian
France	fr-FR	French (Français)
Italy	it-IT	Italian (Italiano)
Greece	el-GR	Modern Greek
Türkiye	tr-TR	Turkish^[2]
Cyprus	el-CY	Modern Greek of Cyprus
Cyprus	tr-CY	Turkish language (Cyprus)
Spain	es-ES	Spanish (Español)
Spain	ca-ES	Catalan (Catalán in Spanish)
Spain	eu-ES	Basque (Non indo-european language)
Spain	gl-ES	Galician (Sister language of Portuguese)
Mexico	es-MX	Mexican Spanish (Español mexicano)
USA	es-US	American/USA Spanish
Puerto Rico	es-PR	Puerto Rico Spanish (USA)
Argentina	es-AR	Argentinian Spanish (Español argentino)
Uruguay	es-UY	Uruguayian Spanish (Español uruguayo)
Chile	es-CL	Chilean Spanish (Español chileno)
Colombia	es-CO	Colombian Spanish (Español colombiano)
Peru	es-PE	Peruvian Spanish (Español peruano)
Ecuador	es-EC	Ecuadorian Spanish (Español ecuatoriano)
Panama	es-PA	Panamenian Spanish (Español panameño)
Venezuela	es-VE	Venezuelan Spanish (Español venezolano)
Portugal	pt-PT	European Portuguese (Português Europeu)
Angolar	pt-AO	Portuguese (Angola)
Capte Verde	pt-CV	Portuguese (Português de Cabo Verde)
Brazil	pt-BR	Brazilian Portuguese (Português Brasileiro)
Japan	ja-JP	Japanese language
Singapore	zh-SG	Chinese (Singapore)
Taiwan	zh-TW	Taiwan Chinese
Hong Kong	zh-HK	Hong Kong Chinese
China	zh-CN	Chinese (Mandarin Chinese of Mainland China)

NOTE:

Most English variants around the world are based on the British English and uses the British spelling. The American English spelling is only used by USA and Phillipines.
India does not have any official language and Hindi is not the official language of India. Moreover, the majority of Indian population does not speak Hindi.
Hong Kong is not country. It is a SAR - Special Administrative Region of mainland China. Hong Kong has its own currency and onlympic team. In addition, in sports matches Hong Kong uses its own flag.
Puerto Rico is not a country. The island is USA non incorporated territory, even though the island has its own olympic team.
Spanish locales don't have much difference other than country code, currency and paper size since most Spanish countries follows the Royal Spanish Academy ^[3]

Change User Interface Language on Linux

It is possible to change the UI interface language of some application on Linux on command line by setting the environment variable LANG to the desired locale code. The default system locale on Linux can be obtained by reading the environment variable $LANG.

In bash shell or any other POSIX shell.

$ echo $LANG
en_US.UTF-8

In Python,

>>> import os 

>>> os.getenv("LANG", "")
'en_US.UTF-8'

The following command temporarily changes the Kwrite KDE text editor language to Swiss German even if the default language used during the Linux distribution installation was not German. This feature is useful for learning new vocabulary of other languages.

env LANG=de_CH kwrite

Lanch kwite with language set to Swiss German detached from terminal (without blocking the terminal emulator).

$ env LANG=de_CH.UTF-8 kwrite 1> /dev/null 2> /dev/null & disown

Explanation:

1> /dev/null redirects the kwrite process' stdout (standard output) to Linux pseudo file /dev/null.
2> /dev/null redirects the kwrite process' stderr (standard error output) to Linux pseudo file /dev/null.
& disown => Detach kwrite process from the terminal in order run this application as a daemon (background process/service) and to avoid blocking the terminal emulator and terminating the kwrite process if the terminal is closed.

Screenshot of KWrite using different locales

Figure 2: KWrite started with en_US American English locale

Figure 3: KWrite started with de_DE German locale for Switzerland

Figure 4: KWrite started with es_ES Spanish locale for Spain

See

Country Code Language List
- https://www.fincher.org/Utilities/CountryLanguageList.shtml
ISO-3166 Country Codes and ISO-639 Language Codes, Oracle Docs
- https://docs.oracle.com/cd/E13214_01/wli/docs92/xref/xqisocodes.html
Standard locale names, Microsoft
- https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names
ISO Country and Language Codes: The Definitive Guide
- https://centus.com/blog/iso-language-codes

Falsehoods Many Programmers Believe About Names ¶

Names are only written using ascii characters. Counterexample: "João" (portuguese version of John) or " Björk" (Icelandic given name).
Names does not contain hyphen (-) or apostrophe (') characters. Counterexample: O'neil - common irish surname.
A person may have only two names, a given name and surname (family name). Counterexample: the full name of Brazil's emperor Pedro II of Brazil was "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael".
People do not change their names, surnames or email.
People will never have identical names.
The name or surname has at least 3 characters. Counterexample: "Wu".
The full name is limited to 100 characters length.
I will never need to deal with foreign names in my database.
Names with the same spelling are always written in the same way with the same spelling. Counterexample: There are several different Japanese names that sounds as "Akira" and are romanized (written in latin script) as that, although they are written using different Kanji symbols and have different meanings.
People have at least one surname. Countereraxmple: In some countries, such as Indonesia and Japan some people may have a single given name and no surname. Members of Japanese royalty do not have surnames or family names. Sukarno, the first president of Indonesia did not have any surname. His full name was just "Sukarno".
People only a have a single given name and there is no whitespace within a given name. Counterexample: Hector Marίa GONZALEZ LÓPEZ. The given name is "Hector Marίa". The patronymic surname is Gonzalez and the mather's family name is López. Many Spanish given names have the suffix Maria in compound given names. Female hispanic given often have the suffix "de dolores", "soledad" and etc.

See also:

Personal names around the world, W3C
- https://www.w3.org/International/questions/qa-personal-names
- How do people's names differ around the world, and what are the implications of those differences on the design of forms, databases, ontologies, etc. for the Web?
Legal name, Wikipedia
- https://en.wikipedia.org/wiki/Legal_name
Middle name, Wikipedia
- https://en.wikipedia.org/wiki/Middle_name
Name change, Wikipedia
- https://en.wikipedia.org/wiki/Name_change
Surname, Wikipedia
- https://en.wikipedia.org/wiki/Surname
Maiden and married names, Wikipedia
- https://en.wikipedia.org/wiki/Maiden_and_married_names
Patronymic surname, Wikipedia
- https://en.wikipedia.org/wiki/Patronymic_surname
A basic guide to using Asian names, Asia Media Centre
- https://www.asiamediacentre.org.nz/features/a-guide-to-using-asian-names
Chinese Naming Conventions - Chinese Culture, Cultural Atlas
- https://culturalatlas.sbs.com.au/chinese-culture/chinese-culture-naming
Japanese Naming Conventions - Japanese Culture, Cultural Atlas
- https://culturalatlas.sbs.com.au/japanese-culture/japanese-culture-naming
Wikipedia:Naming conventions (Chinese)
- https://en.wikipedia.org/wiki/Wikipedia:Naming_conventions_(Chinese)
East Slavic name, Wikipedia
- https://en.wikipedia.org/wiki/East_Slavic_name
Roman naming conventions, Wikipedia
- https://en.wikipedia.org/wiki/Roman_naming_conventions
Nomen gentilicium, Wikipedia
- https://en.wikipedia.org/wiki/Nomen_gentilicium
Cognomen, Wikipedia
- https://en.wikipedia.org/wiki/Cognomen
Praenomen, Wikipedia
- https://en.wikipedia.org/wiki/Praenomen
Italian name, Wikipedia
- https://en.wikipedia.org/wiki/Italian_name
Spanish naming customs, Wikipedia
- https://en.wikipedia.org/wiki/Spanish_naming_customs
Spanish Names: A Beginner’s Guide to Naming Customs and Traditions, ESLZubzz
- https://eslbuzz.com/spanish-names/
Spanish proper names and their cultural secrets: More than just a name, WOrldsAcross
- https://blog.worldsacross.com/index/spanish-proper-names-and-their-cultural-secrets-more-than-just-a-name
Naming - Spanish Culture, Cultural Atlas
- https://culturalatlas.sbs.com.au/spanish-culture/spanish-culture-naming
Naming customs of Hispanic America, Wikipedia
- https://en.wikipedia.org/wiki/Naming_customs_of_Hispanic_America
Portuguese name, Wikipedia
- https://en.wikipedia.org/wiki/Portuguese_name
- A Portuguese name, or Lusophone name – a personal name in the Portuguese language – is typically composed of one or two personal names, the mother's family surname and the father's family surname (rarely only one surname, sometimes more than two). For practicality, usually only the last surname (excluding prepositions) is used in formal greetings.
Arabic name, Wikipedia
- https://en.wikipedia.org/wiki/Arabic_name
How Arabic Names Work: A Guide to Ism, Nasab, Laqab, Nisba, and Kunya
- https://arabic-for-nerds.com/translation/how-are-family-names-constructed-in-arabic/
Mononym, Wikipedia (People with no surname, just a single name)
- https://en.wikipedia.org/wiki/Mononym
List of legally mononymous people, Wikipedia (List of people whose full legal name does not have surname or family name such as members of Japanese royal family)
- https://en.wikipedia.org/wiki/List_of_legally_mononymous_people
O'Neill (surname), Wikipedia
- https://en.wikipedia.org/wiki/O'Neill_(surname)
Akira (given Japanese name), Wikipedia
- https://en.wikipedia.org/wiki/Akira_(given_name)
Category:Compound given names, Wikipedia
- https://en.wikipedia.org/wiki/Category:Compound_given_names
How do I correctly abbreviate compounded first name for academic publications?, Academia
- https://academia.stackexchange.com/questions/154619/how-do-i-correctly-abbreviate-compounded-first-name-for-academic-publications

American English Vs British English ¶

American English	British English
internalization	internationalisation
localization	localisation
meter	metre
meters	metres
program	programme
center	centre
color	colour
favor	favour
favorite	favourite
labor	labour
defense	defence
ofense	ofence
shop	shoppe
shopping mall, mall	shopping centre
tires	tyres
while	while, whilst
football	American football
soccer	football
roommates	roommates, flatmates
fall	autumn
truck	lorry
truck	shorthand for pickup truck

Software Libraries ¶

JavaScript

i18Next

Python

Gettext

Footnotes ¶

[1] Hindustani

[2] The country is formely known as Turkey.

[3] Spanish: Real Academia Española

Internationalization i18n and Localization i10n concepts

Terminology of Internationalization ¶

Internationalization Issues ¶

Common Mistakes ¶

Common Locales ¶

Falsehoods Many Programmers Believe About Names ¶

American English Vs British English ¶

Software Libraries ¶

Footnotes ¶

See also ¶

Unicode ¶

Internationalization and Localization Reading ¶

Numbers and Mesurement ¶