LibreOffice Localization Guide/Advanced Source Code Modifications

Some languages may require preparing patches for some or all of the following files. If you know how to prepare the patches yourself, or want to give it a go, this will probably speed up the process, as it will be less work for somebody else who has to do the integration. Before preparing the patches, you should make sure that you language is NOT in those files (for the version that you want to localize, look into How to prepare a patch for more information). See for example how Aragonese (an-ES) language was added:

Define the default fonts for the locale
LibreOffice allows defining what fonts should be used by default for each locale. Here we can define, for example, which font will be used by LibreOffice Writer as default when you use your own language or script, or which font will be used by the user interface for your localized version of LibreOffice.

Definition of the font default table for your language requires the modification of the file officecfg/registry/data/org/openoffice/VCL.xcu in the source code. The individual VCL.xcu file cannot be found in the installed product. Its contents are merged into /share/registry/main.xcd (where  is the directory in which LibreOffice is installed).

VCL.xcu is divided into blocks (called nodes), one for each language. The first line of the node for a language includes the standard ISO code for the language. There are two different cases you have to consider here.

If your language uses Latin/Cyrillic/Greek characters, then probably you will be happy with the default fonts for English
In this case do nothing. If LibreOffice cannot find a node for a language, then it will fall back to English settings.

If your language uses a different script
Find in the file a language that is somehow similar to yours (at least in the same group, either Latin, or Asian (Chinese, Japanese, Korean) or CTL (complex text layout, including Indian and some Southeast Asian languages). For example, for Khmer we have taken Thai as the model.

Copy the node for that language (that is, the block of lines that refers to that language, from  to  ), change the language code of the new node to the code of your own language. Then, in each one of the sections, change the font list to a set of fonts that support your language (fonts to which LibreOffice will change if it finds a character from your languages).

A node looks like this:

&lt;node oor:name="km" oor:op="replace"&gt; &lt;prop oor:name="UI_SANS" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS System;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="CTL_DISPLAY" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS Syste;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="CTL_HEADING" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS System;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="CTL_PRESENTATION" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS System;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="CTL_SPREADSHEET" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS System;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="CTL_TEXT" oor:op="replace" oor:type="xs:string"&gt; &lt;value&gt;Khmer OS System;Khmer OS;UniKhm&lt;/value&gt; &lt;/prop&gt; &lt;/node&gt;

You can see that fonts are separated by the semicolon character. You have to use the internal name of the font, the one that appears on your font menu when you select fonts in LibreOffice or any other program (not the name of the file that contains the font). These names may have spaces, this is not a problem, include them. Spaces are significant, so do not put spaces before or after the semicolons, nor at the beginning or the end of the font list.

You may select different substitution fonts for different applications, each one of prop statements refers to different tools or situations.

The first font that is listed in UI_SANS will be used by LibreOffice to display the applications' menus.

Don't forget that this file will be used in several platforms and by people who might have different fonts installed. Include in your list of fonts for these platforms; if you know them (Macintosh typically has different fonts). Try to assure that any user will at least have one of the fonts that you have included in the list.

Under unix, where fontconfig is used for font fallback, then if the first entry in the list is not available, fontconfig is consulted for the best replacement font to use based off that fontname and the desired locale. Under other platforms LibreOffice loops through the list until one is found.

Warning: the VCL.xcu file is a UTF-8 file, it can be damaged if edited with a non-utf8 enabled editor. Patches should always be in UTF-8.

When you have finished with this file, and made sure that this is what you really want, you should create a bug for the LibreOffice product, Localisation component, and attach a patch to it. To submit an bug you first need to login to Bugzilla, click on File a Bug, in the next page choose the LibreOffice product, in the next page select: In the Description field, describe your patch in a few words, then click on Add an Attachment to attach your patch (don't forget to write its description). When you are done, click on Submit Bug.
 * Component: Localisation
 * Version: LibO Master
 * Severity: enhancement
 * Summary: Patch for VCL.xcu for language xxxx

Font Fallback
This section needs updating!

In case a font used in a document is not available in a computer in which it is opened, LibreOffice has to use another font to represent the text.

Under unix fontconfig is used for font fallback, the original font name is sent to fontconfig and fontconfig determines the best replacement font to use depending on locale and other information.

For other platforms a large number of these 'fallback fonts' are defined in LibO, for the most usual latin and CJK fonts. You can nevertheless, add new font substitutions (font fallbacks) for the particular fonts of your script, or change the existing fallback fonts in your own private build. Using fallback fonts is important, as you do not want a font to be automatically replaced with some other font that perhaps has different width, size or other characteristics that turn a document into a mess.

Font fallbacks should be defined for at least the fonts defined in the default font lists (see the previous chapter).

The font fallback table that establishes the relationships between the fonts is defined in the  of the file officecfg/registry/data/org/openoffice/VCL.xcu. As an example, you can take this node, in which the fallback fonts for the Korean font Kodig are defined

&lt;node oor:name="kodig" oor:op="replace"&gt; &lt;prop oor:name="SubstFonts"&gt; &lt;value&gt;gulim;gulimche;sundotum;baekmukgulim;dotum;andalesansui&lt;/value&gt; &lt;/prop&gt; &lt;prop oor:name="SubstFontsMS"&gt;&lt;value/&gt;&lt;/prop&gt; &lt;prop oor:name="SubstFontsPS"&gt;&lt;value/&gt;&lt;/prop&gt; &lt;prop oor:name="SubstFontsHTML"&gt;&lt;value/&gt;&lt;/prop&gt; &lt;prop oor:name="FontWeight"&gt;&lt;value&gt;Normal&lt;/value&gt;&lt;/prop&gt; &lt;prop oor:name="FontWidth"&gt;&lt;value&gt;Normal&lt;/value&gt;&lt;/prop&gt; &lt;prop oor:name="FontType"&gt;&lt;value&gt;CJK,CJK_KR&lt;/value&gt;&lt;/prop&gt; &lt;/node>

If you want to define your own font fallback nodes, you should include them in this list, and send a patch for them.

Note: the example contains lines for several different properties (such as SubstFontsMS) that are empty. They have been included here to make you aware of the existence of the properties, but if you create your own substitution node, you should only include property lines for which you actually have some data. In our example, the following lines could be eliminated without affecting the node:

&lt;prop oor:name="SubstFontsMS"&gt;&lt;value/&gt;&lt;/prop&gt; &lt;prop oor:name="SubstFontsPS"&gt;&lt;value/&gt;&lt;/prop&gt; &lt;prop oor:name="SubstFontsHTML"&gt;&lt;value/&gt;&lt;/prop&gt;

Names of fonts in this section must be normalised. All characters must be in lower-case letters and spaces, numbers or other non-letter characters should be eliminated (ignored) in the normalised version of the name. For example, if you have a font called "Kt BT 3", in this node you would have to include its normalised name, which is: "ktbt".

When adding a new font like Kodig also the substituted fonts should get a fallback to it. That means that e.g. gulim should fall back to kodig, sungulim should fall back to kodig, etc. This method is clearly unfriendly, as it adds quadratic complexity. Promises to look into it in OOo 3.0 have been made.

Another issue that you need to keep in mind is that a font might sometimes have more than one name, such as having an English name and a localized name (some CJK fonts even have three names, in English, Japanese and Chinese). If you use the wrong name, the font might not be found, so a "localized font-name table" is now used to relate the different names. In OOo 2.0.0 the ImplLocalizedFontName array in vcl/source/gdi/outdev3.cxx is currently used for this, but in the future OOo plans to provide a mechanism to specify font name translations in VCL.xcu instead.

Glyph Fallback
This section needs updating!

There are situations when a script is used (e.g. in a document or in the UI) which the selected font does not support. E.g. Times does not support Khmer script. It could happen that Khmer script is used but the Times is active. LibO detects that the font does not cover Khmer and tries to do something reasonable by temporarily selecting an alternative font, which supports the script.

The difference between "font fallback" and "glyph fallback" is, that in the "font fallback" case the selected font is not available. In the "glyph fallback" case the font is available, but some of the text rendered in the original font is not supported by the font.

Under unix fontconfig is used for glyph fallback, the original font name and missing glyphs are sent to fontconfig and fontconfig determines the best replacement font to use for those glyphs.

On other platforms the current list of glyph fallback fonts is hard-coded in file vcl/source/gdi/outdev3.cxx, but this will probably change in later versions of LibO.

For most localization projects you will not to do anything about glyph fallback, in which case you could just ignore this section. You will only need to include a font for your script in the glyph fallback font list if - after having defined font defaults in VCL.xcu - you are still having problems with the display of your script in the UI or some documents.

To add a fallback font for your script in this file you need to look for aGlyphFallbackList[] and add your font to it. Please see that fonts of different types are ordered in lines, with an empty string at the end of each line. Add a line to this list with similar characteristics. The name of the font must be normalised, that it, it must be entirely in lower-case letters and whithout spaces, numbers or other non-letter characters. For example, if you have a font called "Kt BT 3", in here you would have to include its normalised name, that is: "ktbt". Your line in the file would be: "ktbt", "",

Prepare the LibreOffice Locale Data file
Please refer to LibreOffice Localization Guide/How To Submit New Locale Data for pointers to locale data documentation.

Prepare the patches that will be needed when including the locale data file
When including the locale data file in LibreOffice, it is also necessary to include information about this locale in a couple of files. You should prepare the necessary patches for files:
 * i18npool/source/localedata/localedata.cxx
 * And for one (and only one) of the following:
 * If the language is some form of English (en_XX), then include it in i18npool/Library_localedata_en.mk
 * If it is some form of Spanish (es_XX), then modify i18npool/Library_localedata_es.mk
 * If not being English or Spanish and it is a country in the Euro zone, then modify i18npool/Library_localedata_euro.mk
 * Otherwise, modify i18npool/Library_localedata_others.mk

Assure that your script is correctly classified
All the scripts that have at this point been defined in Unicode have been classified in OpenOffice as Latin, CTL (indic, left-to-right) or Asian (Chinese, Japanese Korean). If your script is CTL or CJK and it has just been added to Unicode, you might want to make sure that it is listed in the file offapi/com/sun/star/i18n/UnicodeScript.idl. If it is not, that unfortunately will create a larger problem as the published enum cannot be extended and another constant will have to be created, supported in API and core code whereever UnicodeScript was supported before, and then its range has to be added to the file i18npool/source/breakiterator/breakiteratorImpl.cxx.

Tough things to do if your script is not supported yet
This patch file MUST NOT be created manually. There is a dmake target that does this from the sources available under ${INPATH/misc/build/icu/ after the module was built once, and files were modified: dmake create_patch You have to create the patch on a Unix or Linux platform, using Windows will create the wrong patch. Language support has several levels, including collation (alphabetic sorting), cursor movement within your script, hyphenation... if you want to go further into these, you should consult this document at OpenOffice.org.
 * On Linux, LibreOffice is supported by version 4.4.2 of IBM's ICU (International components for Unicode) library. ICU does accept code supporting new languages. You should write such code thinking that it has to work with the current version, as well as with version 4.4.2. Then you need to add your changes to LibO's current patch for ICU, at
 * On MS Windows, LibreOffice receives script support for CTL languages from Microsoft's Uniscribe engine, contained in the file usp10.dll. If you want to try to get support for your language in such engine, you should get in touch with Microsoft (don't tell them that it is for LibreOffice, though).
 * On Mac... HELP WANTED. Somebody please write this section.

Include your locale in the installation set
In order to have installation sets work correctly for your language, modifications are needed in the files listed below.

instsetoo_native/inc_openoffice/windows/msi_languages/Langpack.ulf
You need to add a block of the style:

[OOO_LANGPACK_NAME_1107] en-US = "Khmer" de = "Khmer" [OOO_LANGPACK_DESC_1107] en-US = "Installs Khmer support in %PRODUCTNAME %PRODUCTVERSION" de = "Khmer"

Please note that in this case 1107 is the Microsoft locale ID for Khmer in decimal format, you should include the one for your language, which you should have obtained already at this point, but just in case, you can find it in www.microsoft.com/globaldev/reference/lcid-all.mspx.

setup_native/source/win32/msi-encodinglist.txt
You need a line similar to:

km   0  1107   # Khmer

that includes your ISO language code, the ANSI code-page number for the script, the Microsoft locale ID number (same as in the prior file)... and, if you want, a comment to say which language it is. Warning: ANSI code page number 0 does not seem to work presently, for Unicode languages. If this is the case, please consult issue 47857.

scp2/source/ooo/file_ooo.scp
In this file you need to make sure that - if your language is a CTL language - the file Common-ctl.xcu is installed, and if it is a CJK language, Common-cjk.xcu and Writer-cjk.xcu get installed. Look for these file names within file_ooo.scp and a a new line for your language. As an example, for Khmer (km) - a CTL language - you should add the singled out line below:

File gid_File_Registry_Spool_Oo_Common_Ctl_Xcu TXT_FILE_BODY; Styles = (PACKED,MAKE_LANG_SPECIFIC); Dir = gid_Dir_Share_Registry_Modules_Oo_Office_Common; Name (th) = "/registry/spool/org/openoffice/Office/Common-ctl.xcu"; Name (hi-IN) = "/registry/spool/org/openoffice/Office/Common-ctl.xcu"; Name (ar) = "/registry/spool/org/openoffice/Office/Common-ctl.xcu"; Name (he) = "/registry/spool/org/openoffice/Office/Common-ctl.xcu";

Name (km) = "/registry/spool/org/openoffice/Office/Common-ctl.xcu"; End

Configuration options for installation
If you would like to change some configuration options, such as having a multi-language installer start by default in a language different from the language used in installation, you should look into this page. This is specially interesting for languages that are not supported by MS Windows and need to build multi-lingual programs, so that English (or other) can be used as an installation language, but then the program is directly installed in your language.

When your translation is finished, include the language in the build environment
Your language code must be included in the list specified in file: solenv/inc/langlist.mk Languages are usually not included in this file until they are ready for building. In private builds you do not have to modify it if you run configure with e.g. --with-lang="en-US cs xy" where xy is the ISO code for your language.

Collation (correct alphabetic order for a given locale)
If you believe that LibreOffice does not collate (sort alphabetically) correctly for your language, you can patch it to sort in a different way when your locale is used. If this is the case please look into the Collation in OpenOffice 2.0 document (last version stored by Archive.org, original pages are gone).

"Simple" (there's no such thing..) collator tailoring can be done directly in the locale data file without having to adapt other code. You'll need to read about the ICU Collation Customization (sometimes (i.e. for tables) the old documentation is better readable), unfortunately it's not very intuitive though very mighty. A few locales are using that, you'll find examples in this OpenGrok code search, click on the numbered lines to see the full code.

Number Transliteration
If your script is new to LibreOffice, you need to indicate what are the Unicode code-points of the digits in your script, as well as separation characters, etc, so that the numbers in your script can be used in Calc, or for numbered lists or schemes.

This information is included in file i18npool/source/nativenumber/data/numberchar.h. You should check to make sure that information about the numbers in your language (not script, every language) is included here, as well as separation characters, etc.

You also need to include your script in both the natnum1Locales[] and natnum1[] arrays of file i18npool/source/nativenumber/nativenumbersupplier.cxx.

You should send a single patch that covers changes to these two files.

If this is already done, then you can - for example - format cells in Calc by using:

[NatNum1]

Numbering styles of paragraphs in local script numbers and letters
It is possible in LibreOffice Writer to number paragraphs using local script numbers instead of Latin numbers. You can see the defined styles in LibreOffice Writer in Format→bullets and Numbering→Numbering type tab. In order to change these styles, you need to make the necessary changes in this part of your locale data file.

If you want to use the letters of your script instead of numbers (equivalent to using A, B, C... in English), then it is a little more complicated, as you have to define the style, and then the letters that you have included in  in the locale data file will be used. Using letters requires patching two files:

offapi/com/sun/star/style/NumberingType.idl
In this file, you need to add a new line with a new number, including something like this: /** Numbering in Khmer alphabet letters */

const short CHARS_KHMER = 34; where the number is the next one after the last that you find in the file (you should also place this code at the very end of the fuction).

i18npool/source/defaultnumberingprovider/defaultnumberingprovider.cxx
Here, you need to include an entry like this: case CHARS_KHMER: lcl_formatChars(table_Alphabet_km, sizeof(table_Alphabet_km) / sizeof(sal_Unicode), number - 1, result); break; in, and then include a line like this: {style::NumberingType::CHARS_KHMER,   NULL, LANG_CTL}, in the correct position in.

Note that the last element is LANG_CTL, defining the language as CTL (this will appear in the menus only if CTL is activated), here you can also use LANG_CJK or LANG_ALL.

i18npool/inc/bullet.h
Finally, you have to define the table of permitted characters in this file, using a block like this: static sal_Unicode table_Alphabet_km[] = {

0x1780, 0x1781, 0x1782, 0x1783, 0x1784, 0x1785, 0x1786, 0x1787, 0x1788, 0x1789, 0x178A, 0x178B, 0x178C, 0x178D, 0x178E, 0x178F, 0x1790, 0x1791, 0x1792, 0x1793, 0x1794, 0x1795, 0x1796, 0x1797, 0x1798, 0x1799, 0x179A, 0x179B, 0x179C,                0x179F, 0x17A0, 0x17A1, 0x17A2 };

Outline numbering customization
In LibreOffice Writer you can number different levels of headings into a structured document. The styles in which these outlies are numbered can be seen in Format→bullets and Numbering→Outline tab. In order to change these Outline styles, you need to make the necessary changes in this part of your locale data file.

The Autocorrect Dictionary
A typical acor_lang.dat file is a zip archive containing a few XML files. Perhaps the easiest way to create a localized autocorrect dictionary is to use another dictionary as a template, so let's take a look at extras/source/autotext/lang/en-US/acor_en-US.dat. Here is its contents:

$ unzip -l acor_en-US.dat Archive: acor_en-US.dat Length     Date    Time    Name - -- -           0  2010-01-26 11:20   mimetype 0 2010-01-26 11:20   META-INF/ 673 2010-01-26 11:20   WordExceptList.xml 6803 2010-01-26 11:20   SentenceExceptList.xml 652 2010-01-26 11:20   BlockList.xml 604 2010-01-26 11:20   META-INF/manifest.xml 68972 2010-01-26 11:20   DocumentList.xml -                    ---    77704                     7 files

The files that you will care about are the four XML files in the top level of this archive. Each one of them performs a separate function.

BlockList.xml
     

Until someone writes a better explanation about what this file does, you should be fine with just changing the value of  attribute of the top level element to your language code.

In case an replacement item is not "text only", a kind of document is created, which holds the style information. That is a folder in the acor_nn-NN.dat archive. The file BlockList.xml collects references to these folders. That is the attribute. If such folder does not exist, the node is useless and you should remove it.

In case your language needs such styled auto-correction entries, you should generate them for you and copy the generated parts from the archive, which is generated in your 'user' directory.

DocumentList.xml
    <block-list:block block-list:abbreviated-name="..." block-list:name="…"/> <block-list:block block-list:abbreviated-name="(C)" block-list:name="©"/> ...	<block-list:block block-list:abbreviated-name="abotu" block-list:name="about"/> <block-list:block block-list:abbreviated-name="abouta" block-list:name="about a"/> <block-list:block block-list:abbreviated-name="aboutit" block-list:name="about it"/> <block-list:block block-list:abbreviated-name="abscence" block-list:name="absence"/> ... </block-list:block-list>

This will most probably be the biggest file in the archive. It will contain the actual strings that you want LibO to correct automatically, and their corrections. As you can see from the example above, these don't necessarily have to be mistakes. Some of the automatic corrections are actually a convenience feature.

SentenceExceptList.xml
<?xml version="1.0" encoding="UTF-8"?>  <block-list:block block-list:abbreviated-name="a."/> <block-list:block block-list:abbreviated-name="acct."/> <block-list:block block-list:abbreviated-name="approx."/> ... </block-list:block-list>

This file lists the common abbreviations ending with a full stop character. Its purpose is to inform LibreOffice when the full stop does not indicate an end of a sentence.

WordExceptList.xml
<?xml version="1.0" encoding="UTF-8"?>  <block-list:block block-list:abbreviated-name="GHz"/> <block-list:block block-list:abbreviated-name="MHz"/> <block-list:block block-list:abbreviated-name="OOo"/> <block-list:block block-list:abbreviated-name="THz"/> </block-list:block-list>

WordExceptList.xml contains (only three-letter?) abbreviations that start with two uppercase letters, but end with a lowercase one, such as MHz for MegaHertz and OOo for OpenOffice.org. Typically, two uppercase letters followed by lowercase-only letters is considered a typo in LibreOffice; this file tells it when it's not the case.

Localization on the Help System
When a new language is added to the help system, a new directory for that language should be created in helpcontent2/source/auxiliary. CSS files are localizable, fonts and other options can be set. In .cfg files Title is localizable, and Language should be set to the appropriate language code.

There are a very few localizable images for the help in language directories of icon-themes/colibre/res/helpimg. It is not required to localize them, if an image is not present, it will fall back to English.

Localization of other images
A few toolbar icon images are localizable. Those, that have Latin letters on them, e.g. bold, italic, underline buttons, or sort buttons. You can find examples in the language directories of icon-themes/galaxy/res/. However, for full coverage those localized images must be present in all icon themes, not only in Galaxy.

Special case of languages that use Complex Text Layout (CTL) scripts
The MS-LangID needs to be included in i18npool/source/isolang/mslangid.cxx method

Special case of languages that use a Right To Left (RTL) writing system
The MS-LangID needs to be included in i18npool/source/isolang/mslangid.cxx method

Translation
See Translating LibreOffice.