Localise names with the CLDR
Using the Unicode Common Locale Data Repository’s standard translations 2022-01-25 #data #DDD
Language localisation, part of the broader internationalisation and localisation topic, includes translating text to a local language for a particular locale. In this context, a locale identifier refers to a language associated with a specific geographic region, such as British English.
The Unicode Common Locale Data Repository (CLDR) contains a wide variety of locale-specific names, data formats, and validation rules, as well as details of various languages and scripts. You can use this data for standard localisations in your software.
CLDR lists
CLDR includes a number of standard lists, translated into each language. For example, each of the following is an entry in a (sometimes long) list.
English (en ) |
French (fr ) |
Russian (ru ) |
Japanese (ja ) |
Thai (th ) |
|
---|---|---|---|---|---|
Languages | Russian | russe | русский | ロシア語 | รัสเซีย |
Scripts | Cyrillic | cyrillique | кириллица | キリル文字 | ซีริลลิก |
Regions | Russia | Russie | Россия | ロシア | รัสเซีย |
Months | January | janvier | января | 1月 | มกราคม |
Days | Monday | lundi | понедельник | 月曜日 | วันจันทร์ |
Quarters | Q1 | 1er trimestre | 1-й квартал | 第1四半期 | ไตรมาส 1 |
Time zones | Moscow Time | heure de Moscou | Москва | モスクワ時間 | เวลามอสโก |
Currencies | Russian Ruble | rouble russe | российский рубль | ロシア ルーブル | รูเบิลรัสเซีย |
Units | meters | mètres | метры | メートル | เมตร |
Typography | italic | italique | курсив | イタリック | ตัวเอียง |
CLDR translations - excluding the more obscure lists - include the names of:
- languages
- scripts (writing systems)
- territories, including countries
- calendar names - quarters, months and weekdays, including abbreviations
- time zones
- currencies
- units of measurement
- typographic styles
This means that if you can use CLDR as a source for a list of countries, with translations to different languages, where each country is identified by its ISO 3166-1 two-letter country code.
In general, if you display these kinds of lists or selections in software, and you want to localise your software into multiple languages, you can get the translations from CLDR. Each entry has a code that all localisations share for looking up entries, sometimes a standard code as for countries, and sometimes a simple numeric code. You can also use lists of these codes to include or exclude sub-lists.
Filtered lists
To get a list of currency names, you first need to filter the list to exclude what you don’t consider proper currencies.
You can include most of the ISO 4127
currencies whose three-letter currency codes follow the pattern for a two-letter country code followed by the currency name’s initial, such as USD
(US Dollar).
However, you should exclude the X currencies such as XAU
(gold), deprecated currencies such as RUR
(Russian Ruble 1991-1998), and the unknown currency XXX
.
Similarly, you also need to filter the CLDR territories to get a countries list.
You can exclude large regions with three-digit codes such as 151
(Eastern Europe),
and regions with two-letter codes: EU
(European Union), EZ
(Eurozone), QO
(Outlying Oceania), and UN
(United Nations).
After that, it gets complicated.
In general, depending on which list you want, you may need to filter its contents. CLDR helps with this by including validity data that divides these lists’ entries into categories:
- regular
- special
- deprecated
- reserved
- private use
- unknown
Published data
The CLDR releases page publishes the data in XML format,
whose source resides in the cldr GitHub Project.
The source includes one XML per locale, e.g. en.xml
.
These XML files use the Unicode Locale Data Markup Language (LDML).
The cldr-json GitHub Project generates JSON representations from the XML source. This is also available via npm. Finally, various software libraries make CLDR available directly via their own APIs.
Essential complexity
You can easily get lost in the CLDR’s complexity, which reflects the world’s messiness. However, when you internationalise and localise software, you will find accurate locale data both valuable and satisfying.