User:Santhab

Final report for Balázs Sántha, GSoC 2021, Improving table styles

My project aimed to improve table styles' support in LibreOffice. This is a long time ongoing effort, which is still far from a stable solution.

My work in the summer can be split into several major parts, which I have been working on more or less parallel.

1.: Fixed the most annoying problems of the recent table style handling of LibreOffice.

-tdf#109083 sw table styles: fix missing format update of tables https://git.libreoffice.org/core/commit/840095f417af6619977f688f421e449273c26cae

-tdf#131771 sw: fix missing table style after copying and pasting the table https://git.libreoffice.org/core/commit/ec277dfdda8acd08694b03a6b1fb88c5fede35d2

-sw: test fix of tdf#131771 https://git.libreoffice.org/core/+/913ea7e18af9e04c82c9b33ddd34b4b5a0917f29%5E%21

-tdf#134452 sw: fix page break disappears after applying table style https://git.libreoffice.org/core/+/554c3692b7e3b51ce6ce7772509ba7a2e8777d3a%5E%21

-tdf#143244 sw: fix redo of adding table rows breaks table style https://git.libreoffice.org/core/+/689b5a4862ead541e54e83cb14067cfaa691e2ab%5E%21

''Most of the commits have been backported into 7.2 too. All of my fixes contained unit tests to avoid regressions in the future.''

Made several notes, and partial-results on how DOCX import/export is working currently, and what difficulties we face when it comes to implement a real table style.

 2.1.: Built Alex Ivan's feature/table-style branch:  In 2013, Alex Ivan implemented real table styles, but at the end it did not get merged into master. After a lot of work, I have successfully built and run the 8 year old branch. This work contains the UNO API for table styles, so it would be possible to use in writerfilter, which is based on UNO. Building such an old code was quite hard, as C++ compilers are not really good at backward compatibility. To solve this, I needed to use an older GCC (gcc-4.8). As this version of gcc was not available from the Ubuntu 20.04 repos, I tried to build it from source, unfortunately without success. I found out that the needed package is in the Ubuntu 18.04 repos, so I decided to switch to use that OS. After this, there were further problems with java, gstreamer and odk.

'''I finally solved them by running the autogen.sh script, with the following flags: --disable-gstreamer-0-10 --with-jdk-home=/usr/lib/jvm/java-8-openjdk-amd64 --disable-odk --without-java --without-junit '''

As one of the unit tests still failed, had to make the build without tests: make build-nocheck

The build was successful, but still had to install the client, because running the soffice.bin had linking errors. The command make dev-install had to do the work, but this command runs the unit tests, which one of them fails, so had to edit the Makefile:

-From the line containing "dev-install: build", we need to remove "build". This way the installation will not try to build the left-out unit tests.

The installation was successful, but running the solver/unxlngx6.pro/installation/opt/program/soffice still had problems with Java, when launching the client. So you need to launch it with the following: SAL_USE_VCLPLUGIN=gen solver/unxlngx6.pro/installation/opt/program/soffice

After a lot of work, I finally could run and test the branch in practice. As going trough the building and installation process is very tricky, I made the build available in the following VirtualMachine: https://drive.google.com/drive/folders/1NUvZCKnGzILB05csQ-0TIhYus3-9gWGA?usp=sharing

pw: osboxes.org Opening the terminal will give you the details how to run the client to test the branch!

 2.2.: Examined a possible solution to optimize DOCX tables' import  The idea was to replace direct formattings with the usage of custom paragraphs styles. This could be less fragile, e.g. removing direct formatting wouldn't break the text styles in tables, and the import could be faster (there is no need to process the text of the table paragraphs).

We can use a custom paragraph style inherited from the paragraph style of the table, adding the docDefault, table paragraph style(s) and again, the original paragraph style, i.e. emulate MSO style inheritance.

It's possible to flatten these custom paragraph styles to a single one:

At the DOCX export, this custom inherited paragraph style is replaced with ParagraphStyle1 for the sake of the round-trip (if the custom style wasn't modified by the user during the session).

Creating Paragraph_in_table_like_in_MSO custom styles is optional, i.e. it depends on docDefaults and table paragraph styles.

2.3.: Could we replace docDefaults and table styles with custom styles as described in the 2.2? Yes, a proof of document file, can be found at: https://drive.google.com/file/d/1P8F7olVLdd7_AwVQwVa8JpjcRxZkIWgl/view?usp=sharing

2.4.: How are docDefaults and table styles imported and exported (With OOXML details) Importing docDefault: -added to all paragraph styles

Importing table styles: -via GrabBag (no real import) -are imported under the “tableStyles” property of the InteropGrabBag, see their export in DocxTableStyleExport::TableStyles.

Exporting docDefault: -all the docDefault run props are still there -the para props deleted and migrated into "Normal" para style during DOCX round-trip.

Exporting table styles: -run properties seem to be deleted. -para properties are exported, except shading.

Test documents can be found at: https://drive.google.com/drive/folders/1b5mz5JmsnjAV8HOQDPaXwjvIMXAse2l6?usp=sharing

2.5.: Where docDefaults and table styles are imported in writerfilter These patches removing docDefaults and table style imports, this way they verify the question.: https://drive.google.com/drive/folders/16VHLr8J3H-hgeYUV2szccbdyp7lvaAL0?usp=sharing

2.6.: What problems we face if we would try to map ODT's table-template to DOCX's table-style? Currently LibreOffice uses not a real table style, but a template. In ODF its name is table-template. It was obvious, that it would be really good, if we could map DOCX's table styles to these table-templates, as a lot of support for it is already done. Unfortunately there are several difficulties, which seem to make this idea not possible to implement.: -When a template is applied, all its properties are set as hard formattings on table cells. That means that it will override pre-existing para styles and hard formattings. Also when you apply a real style, you can access and modify the style from the object its applied on, which causes the automatic change of every other paragraphs which uses the same style. Sadly this is not the case with table-templates. -Paragraph styles uses inheritance, which is not supported by table-templates (not even the specification). Paragraph styles inheritance is implemented by SfxItemSets which have parent pointers to a parent SfxItemSet. This way it can look up items along the chain of parents. The problem with table-templates is that, it has tricky structure with optional parts that need different sets of items (first-row, first-column, even-row, odd-row). For example, we can have a style which has a different even/odd rows and then we can have a derived one, that doesn't have different even/odd rows but a different first-row. In this case how should we set our parent pointers?

3.: Optimized the DOCX tables' opening time

The original bug report about the regression can be found at: https://bugs.documentfoundation.org/show_bug.cgi?id=131546

The opening time of the reported document fastened up by 20%. The opening time went from ~3m to ~2m30s.

The fix can be found at: -DOCX import: fix performance regression at tables https://bugs.documentfoundation.org/show_bug.cgi?id=131546

The optimization caused a regression, whose review can be found at here: -fix direct character formatting copied into paragraph level https://bugs.documentfoundation.org/show_bug.cgi?id=143904

What has to be done: -finish the caused regression Edit: fixed, and merged -test and document to see what is done and what is not in the 2013's feature/table-style branch

-- All of my commits can be found at: https://cgit.freedesktop.org/libreoffice/core/log/?qt=author&q=Santha

You can read my weekly reports in the archives: https://lists.freedesktop.org/archives/libreoffice/2021-June/087471.html https://lists.freedesktop.org/archives/libreoffice/2021-June/087501.html https://lists.freedesktop.org/archives/libreoffice/2021-June/087530.html https://lists.freedesktop.org/archives/libreoffice/2021-July/087560.html https://lists.freedesktop.org/archives/libreoffice/2021-July/087592.html https://lists.freedesktop.org/archives/libreoffice/2021-July/087637.html https://lists.freedesktop.org/archives/libreoffice/2021-July/087661.html https://lists.freedesktop.org/archives/libreoffice/2021-August/087685.html https://lists.freedesktop.org/archives/libreoffice/2021-August/087717.html https://lists.freedesktop.org/archives/libreoffice/2021-August/087753.html