Development/GSoC/Ideas

This page lists the project ideas for Google Summer of Code, see the general info about LibreOffice and GSoC. All applicants are required to complete at least one easy hack.

All tasks on this page already indicate mentors for the task. New tasks on this page should be added only by those with the experience and time to invest in mentoring new developers.

Note that the LibreOffice project selects GSOC projects that are well researched and show a good understanding of the scope of the problem. It is also possible to create a project proposal not based on the ideas given below, if the application of the contributor shows good understanding of the problem. In fact, if you apply with one of the prepared ideas below, we expect you to show you did research beyond the abstracts given below even more.

When doing that please use this template:

Title of the task
Some detailed description of the things to accomplish. Don't hesitate to provide details if you have some like code pointers, links to specifications, etc.


 * Required skills / knowledge: C++, Reading other's code, and any other useful skill required go here.


 * Size: 175 hours or 350 hours


 * Difficulty: Range among easy, medium, hard


 * Potential mentors
 * Joe Devel, IRC: jdevel, mail: joe@devel.org

Please move successfully completed projects to Development/GSoC/Successfully Implemented Ideas.

Bring Bugzilla Harmony to releasable state
Bugzilla is the powerful bug tracker we use. Harmony brings together Mozilla's fork and upstream Bugzilla. The remaining tasks preventing a release include ensuring MySQL and PostgreSQL database compatibility. There is a detailed list of release blockers in priority order.

Bugzilla developers can be found in the chat room.


 * Required skills / knowledge: Perl, SQL databases, Docker


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Vladimir Panteleev, IRC: CyberShadow

Feasibility study: building LibreOffice using meson
Currently the LibreOffice build system uses autoconf and make. Most of the complex stuff is organized in the LibreOffice-internal, high-level build system written in make, called gbuild. It offers a lot of functionality to make it easy for developers to build LibreOffice or external libraries on various platforms with a multitude of configuration options. Generally this works well for LibreOffice, until you need to either extend gbuild or even fix a bug. gbuild itself is ~18,000 lines of code and lives in solenv/gbuild/. Additionally the configure.ac file adds ~13,000 lines of mainly shell code and m4 macro calls. And since the whole make code is ~150,000 lines, it takes a lot of time to start a new build. On a first impression a lot of the gbuild functionality can be mapped to meson functionality in a straight-forward way, but as always the first 80% are the easiest.

There is already some kind of playground branch as private/jmux/meson, based on the initial work done by the meson author Jussi Pakkanen in https://github.com/jpakkane/core/commits/master. It currently builds ~80 of the ~200 internal LibreOffice libraries and completely depends on a Linux system to provide the external libraries. The branch currently builds inside a Ubuntu 20.04 / focal chroot (schroot setup can be provided). There is a rebased and fixed version available as private/jmux/meson-gsoc-2021. It's a rebase of the old branch to current master (as in 2021-04-05) and it still builds with Ubuntu 20.04.

This is currently a Linux-only idea. It seems it's a realistic goal to get LibreOffice build and running in the GSoC timeframe on Linux without depending of the possibility to build all the external libraries too.

One of the subtasks is to make Help content build independently from core.

It's possible to work with three contributors in parallel. The tasks are:
 * Add LO dependencies to WrapDB
 * Continue migrating LO libraries to Meson, especially converting still missing code generation bits
 * Develop an individual target to build the LO help tools and help files

For some more information:
 * https://lists.freedesktop.org/archives/libreoffice/2020-February/084575.html
 * https://mesonbuild.com/
 * https://mesonbuild.com/Adding-new-projects-to-wrapdb.html
 * https://gerrit.libreoffice.org/plugins/gitiles/core/+log/refs/heads/private/jmux/meson
 * https://nibblestew.blogspot.com/2020/02/building-very-small-subset-of.html
 * https://nibblestew.blogspot.com/2020/02/trying-to-build-slightly-larger-slice.html
 * https://nibblestew.blogspot.com/2020/02/building-even-more-of-libreoffice-with.html
 * https://nibblestew.blogspot.com/2020/02/unity-build-test-with-meson-libreoffice.html
 * https://nibblestew.blogspot.com/2022/01/building-part-of-libreoffice-on-windows.html
 * https://nibblestew.blogspot.com/2022/01/porting-lo-windows-build-to-macos.html
 * https://nibblestew.blogspot.com/2022/02/compiling-libreoffice-with-meson-even.html
 * https://github.com/jpakkane/core/commits/master


 * Required skills / knowledge: Python, reading makefiles, reading shell code, some C / C++ to understand compiler errors.


 * Size: 350 hours


 * Difficulty: Medium


 * Potential mentors
 * Jan-Marek Glogowski, IRC: jmux, mail: glogow AT fbihome.de
 * Jussi Pakkanen, IRC: jpakkane
 * Olivier Hallot, IRC: ohallot, mail: olivier.hallot@libreoffice.org
 * Luboš Luňák, IRC: llunak

Select tests (or flag patches with missing tests) to run on gerrit patches based on machine learning
Inspired by Mozilla's work in this area (https://hacks.mozilla.org/2020/07/testing-firefox-more-efficiently-with-machine-learning/), it would be great to:
 * extract a training set out of git, gerrit, CI and bugzilla (might need some _very early_ preparations to perhaps retain CI failure logs) by finding regressions that could have been caught by running a proper test on the respective platform, in the right configuration
 * extract a training set out of git, gerrit, CI and bugzilla by finding regressions that later added a test for a feature - thus correlating code areas, and the locus for needed tests
 * and training a suitable AI algorithm with the above, such that:
 * CI can smartly choose the right tests to run, based on the gerrit patch (instead of running _all_ tests on _all_ configurations, which is prohibitive)
 * CI can -1 patches that don't come with test loci touched, if the gerrit patch suggests it


 * Required skills / knowledge: data science/machine learning/AI active, basic & hands-on knowledge. Ability to read and write C++ code, scripting languages, and some basic knowledge of the tools used in our CI chain (Jenkins, gerrit, cppunit, ASAN/UBSAN) - many, but not all of the before might be acquired before GSoC, while joining the community prior to the application phase


 * Size: 350 hours


 * Difficulty: Medium - provided the applicant has good machine learning/data science skills


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten, mail: thb@documentfoundation.org

Z compressed graphic formats support (EMZ, WMZ)
Some graphic formats are compressed with the ZIP (deflate) to make it smaller, while the formats themselves don't support compression. In LibreOffice we already support SVGZ format, but not other formats. The goal of this idea is to look at how SVGZ is implemented and extend that to other formats EMF, WMF). Generally, if we detect Z compression, we uncompress and try to detect the file type again, then keep the uncompressed version.

Reference the the bug report:

SVG decompression:

Graphic format detection (SVG):

Extended goal (350 hours) is to also implement Z compression for SVG, EMF and WMF formats, which is currently missing. This requires to extend the export filters so that they Z compress the stream before writing the graphic into the file. This needs a bit of research what the best way to do it woul be - either adding a completely new entry in the save dialog selection or decide depending on the filename extension (for example svgz means compress, svg means don't compress).


 * Required skills / knowledge: (C++, Reading other's code)?


 * Size: 175 or 350 hours


 * Difficulty: Medium


 * Potential mentors
 * Tomaž Vajngerl, IRC: quikee, mail: quikee @ gmail.com

Implement qt5 / kf5 theming using native Qt widgets (Weld interface)
Since a few years, LibreOffice has slowly created a new theming API (include/vcl/weld.hxx) while implementing it in the gtk3 backend to use native Gtk widgets. Qt5, like all other backends, just implements the drawing of QStyle themed LO widgets, which mostly look like Qt widgets, but their behavior is completely controlled / implemented by LO. This results in some bugs, which simply aren't fixable in any way inside LO. Some Qt style engines implement animation and color handling, which isn't even exposed by the QStyle API, like the default button text color in Breeze (black in LibreOffice, instead if white in KDE). The current state is documented in some kind of "meta bug" (tdf#130857), which has some closed duplicates of backend specific problems, mainly based on some KDE settings, which change the specific behavior of a widget.

The current Weld API is Gtk specific, so some things might need some more abstraction, but otherwise it should be some straight forward job to implement the needed interfaces. This task isn't very creative, but might be interesting for people, who like to work on lower level platform code, which never fails to surprise with unexpected behavior. That can become a bit tedious to implement correctly.

For some more information:
 * Qt5 Weld bug: https://bugs.documentfoundation.org/show_bug.cgi?id=130857
 * Latest post on Caolans blog about Weld progress: https://caolanm.blogspot.com/2019/10/native-gtk-dialogs-in-libreoffice.html


 * Required skills / knowledge: C++, knowledge of one of the involved toolkit APIs, either Gtk or Qt will help, but this probably goes for any non-ancient API used to build GUI applications.


 * Size: 350 hours


 * Difficulty: Medium


 * Potential mentors
 * Jan-Marek Glogowski, IRC: jmux, mail: glogow AT fbihome.de
 * Michael Weghorn, IRC: michaelweghorn, mail: m.weghorn AT posteo.de

User Experience
The design team collected a number of ideas in in the pad http://pad.documentfoundation.org/p/UX-GSoC_Ideas. Not for all ideas mentors have committed so far. If you are interested in a task that has no mentor you would need to find someone. If the topic needs further refinement feel free to contact the UX team in order to prioritize the usability engineering. Some examples for full-featured topics:

Block diagrams
Most expect feature in Draw is the better implementation of block diagrams. That includes the compatibility with Visio but also simplified workflows where entered data get formatted automatically. A couple of predefined styles/presets should allow to quickly switch from one diagram style to another.


 * Required skills / knowledge: (C++, Reading other's code)?


 * Size: 350 hours


 * Difficulty: (Medium)?


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten mail:
 * UX team, IRC: channel #libreoffice-design, mail: design @ global.libreoffice . org

Chart wizard
For experienced people, the current wizard distracts from the workflow and steps are not naturally implemented, for instance the range is selected first and then the type of chart. Over the modules we have different ways to insert charts plus the wizard design is inconsistent with other dialogs which are tabbed. Mockups


 * Required skills / knowledge: C++, UI, UNO, reading other's code


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Katarina Behrens, IRC: bubli mail: bubli @ bubli . org
 * UX team, IRC: channel #libreoffice-design, mail: design @ global.libreoffice . org

Multi-color gradient
While the area fill style dialog has been reworked nicely the ability to deal with more than two steps for gradients is missing. Furthermore, gallery items with gradients that consists of more than two colors or non-linear gradients are;Potential mentors interpreted as a collection of lines when ungrouped making a modification impossible. First analysis showed that multi-stop gradients are loaded well but not visualized as it. The project would need to implement the draw routines as well as the UI control in the area dialog as suggested.


 * Required skills / knowledge: (C++, Reading other's code)?


 * Size: 350 hours


 * Difficulty: (Medium)?


 * Potential mentors
 * Tomaž Vajngerl, IRC: quikee, mail: quikee @ gmail.com
 * UX team, IRC: channel #libreoffice-design, mail: design @ global.libreoffice . org

Improve OpenPGP encryption experience
With LibreOffice 6.0, there's experimental PGP/GPG encryption support. A number of shortcomings are there still, that make using this cool feature unnecessarily hard, e.g. selecting recipients anew for every safe, or finding the right keys. Keyrings can get potentially infinitely large, so smart searching / traversing / filtering is needed. Additionally, some asynchronous querying would be great, as currently it takes several minutes to retrieve all keys for large keyrings.


 * Required skills / knowledge: C++, Reading other's code, some idea about public key crypto


 * Size: 350 hours


 * Difficulty: Varying


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten mail: thb@libreoffice.org
 * UX team, IRC: channel #libreoffice-design, mail: design@global.libreoffice.org

VBA Macros - Tests and missing API functions
We support VBA (Visual Basic for Applications) Macros in LibreOffice, but the implemented API isn't complete and the API functions aren't largely tested. The consequence of this is that the VBA macros in OOXML documents don't run as intended in LibreOffice, which causes compatibility problems. The goal off this idea would be to add tests for the functions already implemented and then look for what functions are missing for a method or module and add them.

The VBA API in LibreOffice is located in, Common implementation , Calc specifics , Writer specifics.


 * Required skills / knowledge: C++, Basic, Reading other's code, some experience with VBA Macros would be beneficial


 * Size: 175 hours or Large


 * Difficulty: Medium to Hard


 * Potential mentors
 * Tomaž Vajngerl, IRC: quikee, mail: quikee @ gmail.com

Attach animations to styles
Currently, Impress styles control most of the visual shape appearance, but not the slideshow animation effect. Which is a pity, as the styles concept is pretty powerful inside LibreOffice, and provides a nice way to change animation settings and type for a great number of objects simultaneously. For a slightly different view onto the same problem, see bug report, and  from the LibreOffice side.

Original patch from GSoC 2010: https://cgit.freedesktop.org/libreoffice/build/tree/patches/dev300/sd_effects_styles.diff?h=master-backup


 * Required skills / knowledge: C++, Reading other's code


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten, mail: thb @ documentfoundation . org
 * Katarina Behrens, IRC: bubli, mail: bubli @ bubli . org

Rework Impress slideshow to use DrawingLayer primitives
The Impress slideshow, while being designed to only interact with Impress via interfaces, had to resort to an ugly hack to be able to render all Impress content. That was ok back in the day, but is becoming a liability these days. Nowadays, what one want to use is the DrawingLayer Primitives (https://wiki.openoffice.org/wiki/DrawingPrimitives), which means porting over slideshow/source/engine/shapes/* to use this kind of abstraction, instead of the StarView Metafile previously in use.


 * Required skills / knowledge: C++, Reading other's code.


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Thorsten Behrens, IRC: thorsten, mail: thb @ documentfoundation . org

Implement table styles
Calc so far lacks real table styles, which for example can be seen in Impress tables, with formatting of header row, header column, banded rows and columns, total row, rightmost column, ...

The existing old technology Format &rarr; AutoFormat Styles are visually similar, but very inefficient in that they apply individual attributes to individual cells, which a) is slow, and b) bloats the document size, and also can't cope with hidden rows or columns to keep a visually stable table style layout.

See also


 * Required skills / knowledge: C++, Reading other's code, Debugging, Understanding the concept of different layers of cell attribution, Having an idea of how a single attribute with one value could be used for this


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Eike Rathke, IRC: erAck, mail:

Document Themes
Document themes are a simple way to allow users to quickly & consistently restyle the entire document. There is an implementation of this in MSO, and also Calligra plans to something in this regard, so this task actually consists of 2 parts:


 * Implementing user interface for the feature
 * Implement the ODF support for that

More ideas about the feature you can get by reading http://blogs.msdn.com/b/jensenh/archive/2006/02/22/537054.aspx and https://blogs.kde.org/2011/12/14/fruits-css2-shared-themes.

The import code for DOCX is here: https://cgit.freedesktop.org/libreoffice/core/tree/writerfilter/source/ooxml, you might need to change also https://cgit.freedesktop.org/libreoffice/core/tree/writerfilter/source/dmapper

For XLSX and PPTX, it resides in https://cgit.freedesktop.org/libreoffice/core/tree/oox/source


 * Required skills / knowledge: C++, Reading other's code, reading specifications :-)


 * Size: 350 hours


 * Difficulty: Hard


 * Potential mentors
 * Tomaž Vajngerl, IRC: quikee, mail: quikee @ gmail.com

Improve Zoner Draw import filter
libzmf, a library for import of Zoner Draw documents, was implemented as a GSoC 2016 project. Currently the library only supports documents created by versions 4-5. There are two goals in this task: Both formats have been almost completely reverse-engineered.
 * handle some of the missing features (blending, warping of the bounding rectangle);
 * add support for the file format produced by versions 2-3.


 * Required skills / knowledge: C++, Reading other's C++ code (to understand the existing libzmf code). Python (for OLEToy) is welcome, but probably won't be needed


 * Size: 350 hours


 * Difficulty: Medium


 * Potential mentors
 * David Tardon, IRC: dtardon, mail:

Improve Adobe Pagemaker import filter
libpagemaker, a library for import of Adobe Pagemaker documents, was implemented as a GSoC 2013 project. In the current state, the library only supports documents created by versions 6.5-7.0 on Windows. Documents of version 6.0 should be opened more-or-less correctly, but the support has never been specifically tested. Documents created on Mac are never opened correctly because of many oversights in the code.

The goals of this taks are to fix import of Mac 6.5-7.0 files, implement some missing features and add support for older versions, both Mac and Windows (starting at 6.0 and proceeding to older ones until we run out of time :-)

We will use OLEToy for any necessary reverse-engineering. We will also create documents for regression testing for all versions we add support for.


 * Required skills / knowledge: C++, Reading other's C++ code (to understand the existing libpagemaker code). Python (for OLEToy)


 * Size: 350 hours


 * Difficulty: Medium


 * Potential mentors
 * David Tardon, IRC: dtardon, mail:
 * Fridrich Strba?, IRC: Fridrich, mail:

UNO
UNO is the LibreOffice component model, cross-language and intra- as well as inter-process. It is somewhat similar to Corba and COM. It is used to extend LibreOffice via document-related scripts and more general extension packages, as well as to use LibreOffice functionality remotely from another process.

Rust UNO Language Binding
UNO's cross-language abilities are implemented by bridging between various language-specific environments and a binary runtime representation (with a C API). Next to C++, Java, and Python, it would be great to have such a bridge also for a great language like Rust.

One aspect is to use Rust FFI to interface with the binary UNO C API. Another is to design the Rust representations of the various UNO constructs (its data types; objects with their multiple interfaces and methods; services and singletons), so that this language binding can not only be used to interact with existing LibreOffice services written in other languages, but also to create new services in Rust. A third aspect could be to create a pure Rust implementation of the UNO remote bridge protocol.

Some documentation pointers are: Some code pointers are:
 * UNO Type System
 * UNO Object Life Cycle Model
 * Uno Remote Protocol
 * as an example of an in-process UNO bridge (for Java, via JNI)
 * as an example of a remote UNO bridge (in C++)


 * Required skills/knowledge: Rust, interfacing to low-level C/C++, working against formal specifications


 * Size: 350 hours


 * Difficulty: Medium to hard


 * Potential mentors
 * Stephan Bergmann, IRC: sberg, mail: sbergman @ redhat . com
 * Michael Stahl, IRC: mst___, mail: mst AT libreoffice DOT org
 * Bjoern Michaelsen, IRC: Sweetshark, mail: bjoern.michaelsen AT canonical DOT com

LUA UNO Language Binding
UNO's cross-language abilities are implemented by bridging between various language-specific environments and a binary runtime representation (with a C API). Next to C++, Java, and Python, it would be great to have such a bridge also for a small, lightweight scripting language like LUA.

LUA is small (few K lines of C), and written with the goal of embedding in mind:
 * https://en.wikipedia.org/wiki/Lua_(programming_language)
 * https://www.lua.org/ddj.html

Besides hooking LUA up with LibreOffice's UNO subsystem, a taker for this project would need to map the various UNO constructs (its data types; objects with their multiple interfaces and methods; services and singletons) to suitable LUA types, so that this language binding can not only be used to interact with existing LibreOffice services written in other languages, but also to create new services. LUA should be expressive enough to permit that (its dynamic typing helps a lot there), just for exceptions and error handling, some extra thoughts might be needed:
 * Basic might be a good starting point: https://wiki.openoffice.org/wiki/Documentation/DevGuide/ProUNO/Basic/Exception_Handling
 * but see this article for LUA error handling concepts: http://lua-users.org/wiki/ErrorHandling

Some code pointers are:
 * as an example of an in-process UNO bridge (for Java, via JNI)
 * as an example of a remote UNO bridge (in C++)


 * Required skills/knowledge: LUA, interfacing to low-level C/C++, working against formal specifications


 * Size: 350 hours


 * Difficulty: Medium to hard


 * Potential mentors
 * Michael Stahl, IRC: mst[_]*, mail: mst AT libreoffice DOT org
 * Thorsten Behrens, IRC: thorsten, mail: thb AT libreoffice DOT org
 * (Bjoern Michaelsen, IRC: Sweetshark)

Convert Writer's Java UNO API tests to C++
A big chunk of UNO API tests are still implemented in Java, but to minimise the dependency on Java during the build, we have a long-term plan to move them to C++.

Currently a small set of test are already converted and can be found in here. The Java test are located in here.


 * Required skills/knowledge: Java, C++


 * Size: 175 hours or 350 hours


 * Difficulty: Medium


 * Potential mentors
 * Tomaž Vajngerl, IRC: quikee, mail: quikee @ gmail.com

Ideas without a mentor
A number of ideas from previous years can be found at the Development/GSoC/Ideas without a mentor page. Please note that you need to find a mentor willing to mentor the task. There is no guarantee that anyone in the community is going to mentor one of these tasks this year.