Development/ISCAS Summer of Open Source Promotion Plan/Ideas

This page lists the project ideas for Institute of Software Chinese Academy of Sciences Summer of Open Source Promotion Plan. All applicants are required to complete at least one easy hack from the difficultyInteresting category.

Note that the LibreOffice project selects projects that are well researched and show a good understanding of the scope of the problem. It is also possible to create a project proposal not based on the ideas given below, if the application of the student shows good understanding of the problem. In fact, if you apply with one of the prepared ideas below, we expect you to show you did research beyond the abstracts given below even more.

When doing that please use this template:

Title of the task
Some detailed description of the things to accomplish. Don't hesitate to provide details if you have some like code pointers, links to specifications, etc.


 * Required skills / knowledge: C++, Reading other's code, and any other useful skill required go here.


 * Difficulty: Range among easy, medium, hard


 * Potential mentors
 * Joe Devel, IRC: jdevel, mail: joe@devel.org

Feasibility study: building LibreOffice using meson
Currently the LibreOffice build system uses autoconf and make. Most of the complex stuff is organized in the LibreOffice-internal, high-level build system written in make, called gbuild. It offers a lot of functionality to make it easy for developers to build LibreOffice or external libraries on various platforms with a multitude of configuration options. Generally this works well for LibreOffice, until you need to either extend gbuild or even fix a bug. gbuild itself is ~18,000 lines of code and lives in solenv/gbuild/. Additionally the configure.ac file adds ~13,000 lines of mainly shell code and m4 macro calls. And since the whole make code is ~150,000 lines, it takes a lot of time to start a new build. On a first impression a lot of the gbuild functionality can be mapped to meson functionality in a straight-forward way, but as always the first 80% are the easiest.

There is already some kind of playground branch as private/jmux/meson, based on the initial work done by the meson author Jussi Pakkanen in https://github.com/jpakkane/core/commits/master. It currently builds ~80 of the ~200 internal LibreOffice libraries and completely depends on a Linux system to provide the external libraries. The branch currently builds inside a Ubuntu 20.04 / focal chroot (schroot setup can be provided). There is a rebased and fixed version available as private/jmux/meson-gsoc-2021. It's a rebase of the old branch to current master (as in 2021-04-05) and it still builds with Ubuntu 20.04.

This is currently a Linux-only idea. It seems it's a realistic goal to get LibreOffice building and running in the project timeframe on Linux without depending of the possibility to build all the external libraries too.

For some more information:
 * https://lists.freedesktop.org/archives/libreoffice/2020-February/084575.html
 * https://mesonbuild.com/
 * https://conan.io/
 * https://gerrit.libreoffice.org/plugins/gitiles/core/+log/refs/heads/private/jmux/meson
 * https://nibblestew.blogspot.com/2020/02/building-very-small-subset-of.html
 * https://nibblestew.blogspot.com/2020/02/trying-to-build-slightly-larger-slice.html
 * https://nibblestew.blogspot.com/2020/02/building-even-more-of-libreoffice-with.html
 * https://nibblestew.blogspot.com/2020/02/unity-build-test-with-meson-libreoffice.html
 * https://github.com/jpakkane/core/commits/master

Develop some meson code to build external projects
The scope of this are externals that use configure and MSbuild. Start migrating externals. Evaluate Conan.


 * Required skills / knowledge: Python, reading makefiles, reading shell code, some C / C++ to understand compiler errors.


 * Difficulty: Hard


 * Potential mentors
 * Jan-Marek Glogowski, IRC: jmux, mail: glogow AT fbihome.de
 * Jussi Pakkanen, IRC: jpakkane
 * Luboš Luňák, IRC: llunak

Continue migrating LibreOffice libraries to Meson
Focus on converting still missing code generation bits.


 * Required skills / knowledge: Python, reading makefiles, reading shell code, some C / C++ to understand compiler errors.


 * Difficulty: Hard


 * Potential mentors
 * Jan-Marek Glogowski, IRC: jmux, mail: glogow AT fbihome.de
 * Jussi Pakkanen, IRC: jpakkane
 * Luboš Luňák, IRC: llunak

Develop an individual target to build the LibreOffice help tools and help files with Meson

 * Required skills / knowledge: Python, reading makefiles, reading shell code, some C / C++ to understand compiler errors.


 * Difficulty: Hard


 * Potential mentors
 * Jan-Marek Glogowski, IRC: jmux, mail: glogow AT fbihome.de
 * Jussi Pakkanen, IRC: jpakkane
 * Olivier Hallot, IRC: ohallot, mail: olivier.hallot@libreoffice.org
 * Luboš Luňák, IRC: llunak

Select tests (or flag patches with missing tests) to run on gerrit patches based on machine learning
Inspired by Mozilla's work in this area (https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/), it would be great to:
 * extract a training set out of git, gerrit, CI and bugzilla (might need some _very early_ preparations to perhaps retain CI failure logs) by finding regressions that could have been caught by running a proper test on the respective platform, in the right configuration
 * extract a training set out of git, gerrit, CI and bugzilla by finding regressions that later added a test for a feature - thus correlating code areas, and the locus for needed tests
 * and training a suitable AI algorithm with the above, such that:
 * CI can smartly choose the right tests to run, based on the gerrit patch (instead of running _all_ tests on _all_ configurations, which is prohibitive)
 * CI can -1 patches that don't come with test loci touched, if the gerrit patch suggests it


 * Required skills / knowledge: data science/machine learning/AI active, basic & hands-on knowledge. Ability to read and write C++ code, scripting languages, and some basic knowledge of the tools used in our CI chain (Jenkins, gerrit, cppunit, ASAN/UBSAN) - many, but not all of the above might be acquired before starting on the project, while joining the community prior to the application phase


 * Difficulty: Hard


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten, mail: thb@documentfoundation.org

Implement table styles
Calc so far lacks real table styles, which for example can be seen in Impress tables, with formatting of header row, header column, banded rows and columns, total row, rightmost column, ...

The existing old technology Format &rarr; AutoFormat Styles are visually similar, but very inefficient in that they apply individual attributes to individual cells, which a) is slow, and b) bloats the document size, and also can't cope with hidden rows or columns to keep a visually stable table style layout.

See also


 * Required skills / knowledge: C++, Reading other's code, Debugging, Understanding the concept of different layers of cell attribution, Having an idea of how a single attribute with one value could be used for this


 * Difficulty: Hard


 * Potential mentors
 * Eike Rathke, IRC: erAck, mail:

Improve Zoner Draw import filter
libzmf, a library for import of Zoner Draw documents, was implemented as a GSoC 2016 project. Currently the library only supports documents created by versions 4-5. There are two goals in this task: Both formats have been almost completely reverse-engineered.
 * handle some of the missing features (blending, warping of the bounding rectangle);
 * add support for the file format produced by versions 2-3.


 * Required skills / knowledge: C++, Reading other's C++ code (to understand the existing libzmf code). Python (for OLEToy) is welcome, but probably won't be needed


 * Difficulty: Hard


 * Potential mentors
 * David Tardon, IRC: dtardon, mail:

Improve Adobe Pagemaker import filter
libpagemaker, a library for import of Adobe Pagemaker documents, was implemented as a GSoC 2013 project. In the current state, the library only supports documents created by versions 6.5-7.0 on Windows. Documents of version 6.0 should be opened more-or-less correctly, but the support has never been specifically tested. Documents created on Mac are never opened correctly because of many oversights in the code.

The goals of this taks are to fix import of Mac 6.5-7.0 files, implement some missing features and add support for older versions, both Mac and Windows (starting at 6.0 and proceeding to older ones until we run out of time :-)

We will use OLEToy for any necessary reverse-engineering. We will also create documents for regression testing for all versions we add support for.


 * Required skills / knowledge: C++, Reading other's C++ code (to understand the existing libpagemaker code). Python (for OLEToy)


 * Difficulty: Hard


 * Potential mentors
 * David Tardon, IRC: dtardon, mail:
 * Fridrich Strba?, IRC: Fridrich, mail: