User:Hossein/Blog/string

Strings are very important types of data that are using in LibreOffice. Firstly, they are useful for storing textual data, and is essentially a sequence of characters. As LibreOffice has many modules that depend on various libraries and languages, there are different string types in LibreOffice. Here, we discuss some of them.

Characters and Strings in C++
In C++, the standard std::string is available alongside the internal LibreOffice data types.

The standard std::string is not the data type of choice for storing textual values, and passing them between classes and methods, because LibreOffice has its own set of data types for this purpose. See the next sections for more information.

Please note that the usual functions to work with C strings may give unexpected results when user does not account for multi-byte encodings (like utf-8). For example, the length of the utf-8 strings as reported by std::string::length shows the correct count of bytes (code units), but not the count of Unicode codepoints or "characters".

Characters and Strings in LibreOffice C++ Source Code
In addition to the above C string type, in C++ OString is the 8-bit-string data type in LibreOffice that does not keep information about its encoding. On the other hand, OUString uses UTF-16 encoding, and is more widely used.

Code sample
Here is a sample code snippet for working with these LibreOffice string classes in C++:

Characters and Strings in C
Some small (but important) parts of LibreOffice are in C programming language. In this case, the main type is the char[] (which is of type char * with slight difference). Essentially, it is an array of 8-bit (1 byte) characters that end with the NULL byte: '\0' or the character with the code zero. The char data type itself is used to store individual 8-bit characters. It is also possible to store UTF-8 Unicode strings in C strings.

Code sample
Here is a sample code snippet for working with these data types in C:

Characters and Strings in LibreOffice C Source Code
The underlying Unicode character data types for LibreOffice is sal_Unicode, and the string types rtl_String / rtl_uString. They are used in C source codes.

Code sample
Here is a sample code snippet for working with these data types in C:

Characters and Strings in Windows
For handling Unicode characters, wide characters are used in Windows. The wide character type wchar_t and wchar_t[] strings are based on it. The C++ versions of this string types is std::wstring.

This code is Windows-specific:

Please note that this code snippet is the continuation of the above code.

Characters and Strings in Qt
As LibreOffice provides Qt UI, there is a need to work with Qt data types. Specifically, QString is the string data type provided by the Qt library. The QString class provides a rich set of functions that are very useful to store and manipulate textual data in C++ applications that use Qt library.

For more information, refer to the QString page in the Qt 6 documentation:

https://doc.qt.io/qt-6/qstring.html

Characters and Strings in GTK/GLib
Additionally, LibreOffice provides GTK UI, thus there is also a need to work with GTK data type in the relevant source files. Specifically, the character data type used in the LibreOffice is the gchar, and the string data type gchar *.

Also, GString (GLib) is the struct suitable for storing and manipulating textual data. You can see its structure and utility functions in the glib manual:

https://docs.gtk.org/glib/struct.String.html

Code Sample
Here is a sample code, gchar.c:

You can compile it with:

Refactoring String Types
Not all the possible string data types are desirable. These are some of the refactoring done:

It is now converted to the std::string:

There are situations where you have to pass a C string to a function in order to get some textual data from a C function. In such cases that changing the data is needed, you can use std::vector instead. For example:

String Literals, Streams, Buffers and String View Types
These are the classes for the literals used in LibreOffice:

These are the streams and buffers classes useful for creating temporary object for string manipulation:

At last, these are some of the stringview types used in the LibreOffice:

We will discuss about these types in the next blog posts.