LibreOffice Localization Guide/How To Submit New Locale Data

General Information
To be able to fully support a new language or locale or an already existing but not yet fully supported language/country combination as a locale, including separators, number formats, calendar data and currencies and having it selectable as default document language, LibreOffice needs a locale data file.

Locale data files can quite easily be generated with the generator available at http://www.it46.se/localegen/ and its documentation. Unfortunately the generator is down now (at least since May 2018) and not even the documentation is available in archive.org's WayBackMachine; attempts were made to contact the devs, but to no avail. So a locale data file now needs to be manually created.

For technical details and semantics of elements please see the '''documentation comments in the locale data DTD file''' and a sample locale data file; for example, en_US locale. If you want to adapt from an existing locale data file, the full list of them is available here.

Note that several locale data elements may be inherited from another locale's data by means of the ref="..." attribute if they share identical data, which may come handy if locales are to be created for the same language but different countries that differ only in a few elements such as currency symbols. This would have to be done manually though. Doing so also reduces the memory footprint needed during runtime when the data libraries are loaded.

Please note that we will need your license statement to be able to include your work. Don't be afraid, it's just a two-liner you send to the mailing list; see Development/Developers.

Pitfalls
There are a few pitfalls or things to think about when generating locale data:

Markers
There are typographic single quotation marks QuotationStart and QuotationEnd and double quotation marks DoubleQuotationStart and DoubleQuotationEnd to be defined. Apparently the single quotation marks are often confused to be also double quotation marks, please check.

Common single quotation marks

 *  ‘  U+2018 LEFT SINGLE QUOTATION MARK
 *  ’  U+2019 RIGHT SINGLE QUOTATION MARK
 *  ‚  U+201A SINGLE LOW-9 QUOTATION MARK
 *  ‛  U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
 *  ‹  U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
 *  ›  U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

Common double quotation marks

 *  «  U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
 *  »  U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
 *  “  U+201C LEFT DOUBLE QUOTATION MARK
 *  ”  U+201D RIGHT DOUBLE QUOTATION MARK
 *  „  U+201E DOUBLE LOW-9 QUOTATION MARK
 *  ‟  U+201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK

Character Range
In section H. Enumeration and Scripts, the default ''H1. Character range for indexes'' entry is A-Z. If your language uses other characters, e.g. additionally accented characters, or completely different characters, you probably want to add them at the proper position. See the generator's documentation and/or the documentation in the locale data DTD file for the LC_INDEX section.

Unicode Script
The preselected BasicLatin and Latin1Supplement scripts usually are sufficient for Western European languages. If your language uses other or completely different characters please select the appropriate Unicode script(s). For the distribution of characters in different Unicode scripts see The Unicode Character Code Charts By Script.

Currency Formats
The generator in step 3 section I. Currency offers two list boxes, ''I6. Currency format for positive values and I7. Currency format for negative values''. While the format for positive values defaulted to $1 (currency symbol immediately preceding the amount) may be correct, the default format for negative values ($1) (parentheses around symbol and amount, but no minus sign) almost certainly is not for countries other than US. Please take the time and choose the correct entries of both list boxes.

Submitting the Locale Data File
To contribute a locale data file for your locale, first create the locale data file using the generator, then login to Bugzilla and then submit a new issue using this URL, and attach the generated file to that issue.

If you arrived at this page because an already existing issue pointed you here to submit new locale data, you don't have to create a new issue using one of the above URLs of course, attach the locale data file to the already existing issue instead.

Building your own changes
If you already successfully built LibreOffice from source AND the language/locale mapping is already known to LibreOffice (see LibreOffice Localization Guide/Adding a New Language or Locale) you can test your new locale data by doing the following modifications to make it known to the build system and link it into one of the locale data libraries:


 * place your locale data file into the  directory and name it after the lower case ISO 639 alpha code followed by an underscore followed by the upper case ISO 3166 country code followed by .xml, for example,   for the English-US locale data. Here in this description the example name your_DATA.xml will be used for clarification
 * so have a file
 * edit the file
 * starting at around line 55 there's a struct, go to the end of that struct and append your locale as a   entry, for example
 * make sure you append a comma to the previously last entry, now the second last
 * edit the file
 * add a line
 * make sure you include the trailing \ backslash
 * we prefer alphabetical sorting of that list, so please pick the correct location

You're now set to let the build generate code from your_DATA.xml and add it to the library by executing in your build environment

The build might break if you have errors in your locale data file, however, even if it doesn't it is worth to inspect the warnings the generator emits but that are not displayed in the default build command. To see them execute the following sequence:

After the build finished you may have to scroll up in your terminal to see the emitted warnings. Judge whether they are to be taken serious or not, the generator tries to analyze some semantics, which may or may not be applicable to your locale.

Once ready and everything seems ok, you can start LibreOffice Calc with  and take a look at the number formats for your locale using  and then select Numbers and there from the Language list pick your locale and inspect the number formats displayed. Travel through all categories to see if they are correct.

You could now create a git commit from your changes and submit that to gerrit for review if you are familiar with that process (see Development/gerrit), or simply attach the final your_DATA.xml file to a bug submitted as described above under.