Documentation/ODF Markup

There exist very different reasons why someone wants to use the markup of an ODF file, for example: The following article is a collection of tips, tricks and hints to help you in such tasks.
 * You want to learn, how the markup looks like.
 * You want to be sure the file doesn’t contain any hidden confidential information.
 * You want to compare the markup of other producers with the markup generated by LibreOffice.
 * You want to investigate, whether a bug is a file format error.
 * You want to manually repair an error in the markup.
 * You want to manually do things, for which LibreOffice has no good UI.
 * You want to directly generate ODF files without using LibreOffice.
 * You want to apply an XSLT to the document without using LibreOffice.

XML Markup
An ODF file is a single XML file or a zip archive with XML files. You do not need in-depth knowledge of XML to understand the structure of a file text. Some knowledge about elements and attributes in XML is sufficient.

For example, a drawn line will be represented like the following markup in file text: You will find introductions to XML in the internet, for example https://www.w3schools.com/xml/default.asp

Elements and attributes have namespace prefixes in ODF. The standard allows to use own namespaces to implement and mature new features, before they become part of the standard. LibreOffice uses the prefix “loext:” for this in most cases. The page LibreOffice ODF extensions lists such features.

‘Flat XML ODF’-File
An ODF file can either be a zip-archive or a single ‘Flat XML ODF’-file. The latter has file extension fodt, fods, fodp, or fodg as default. To generate a ‘Flat XML ODF’-File save your document in ‘Flat XML ODF’-format. You find it in the drop-down list in the save dialog.



The structure of a ‘Flat XML ODF’-file looks like this:

The “document” root element lists the namespaces, the ODF version and the mime-type of the document, here for a Draw-document.

The “meta”-element contains those information, which you see in File > Properties.

The “settings”-element contains information, which you have set in Tools > Options. And it contains information about last view state of the document and used printer.

The “script”-element contains macros, which are embedded in the document.

The ”font-face-decls”-element has information about the fonts used in the document.

The “styles”-element contains named styles, which you see in the pane “Styles” of the Sidebar, list definitions, line end definitions, dash pattern, fill information like gradient and hatch pattern definitions, and table templates.

The “automatic-styles” element contains information similar to the “styles” element, but which are not directly visible to the user. Here settings from direct formatting is stored for example.

The “master-styles”-element has information about master slides and page styles.

The “body”-element contains the content. The content elements have no style information themselves, but use attributes like “style-name” or “text-style-name” to refer to a style definition in the “styles” or “automatic-style” element.

Embedded images are base64 encoded.

When inspecting a ‘Flat XML ODF’-file it is highly recommended to use an editor, which can collapse elements and can present an indented view of the elements. You should at least enable Pretty Printing.

Working with zip-Format
All others, not ‘flat’ ODF formats are zip-archives. You will find tips to change the file name extension to ‘zip’ before using an unpacker. That is in most cases not necessary. Most modern packers recognize zip formats even without the files having the file extension 'zip'.

7-Zip on Windows 10 allows to open an archive without actually unpacking it. It shows the content similar to a file manager.

The elements mentioned in section ‘Flat XML ODF’ are put into several files. In addition you see some folders. Besides META-INF, a name of a file folder is not standardized. MS Office uses ‘media’ instead of ‘Pictures’ for example.

The folders ‘Object 1’, ‘Object 2’ … contain embedded OLE objects, Math equations and charts for example. The folder ‘ObjectReplacements’ contains images of these OLE objects, which allows other applications to display something, in case they are not able to handle an OLE object.

The folder META-INF contains a file ‘manifest.xml’. It lists all files contained in the archive and provides needed information in case a file in encrypted.

If you unpack the archive and pack it later, you must keep the following in mind:
 * Do not pack the folder you have got when unpacking, but go into the folder, select the contents and create the zip archive from it.
 * The Open Document Format has some restriction on the structure of the zip archive. Especially the file ‘mimetype’ needs to be the first one in the package and uncompressed. You find details in part “Packages” in the ODF specification.

In case you have newly packed the archive, you can open the resulting file in LibreOffice and save it. LibreOffice is often tolerant of structural errors and will create a valid structure when saving.

7-Zip on Windows 10 allow this work flow:
 * 1) ‘Open’ the archive without changing the file name extension. Do not ‘Extract’ it.
 * 2) Use ‘Edit’ or ‘View’ from the context menu of a file therein. That opens the file in an editor. The used editor can be configured in 7-Zip.
 * 3) Make your changes in the editor. Save the file in the editor and close the editor.
 * 4) Now 7-Zip ask you whether to update the file in the archive. Agree and close 7-Zip. That’s it.

In this way, the structure of the archive is preserved. Similar workflow exists for other packers and other operating systems.

As always, it is highly recommended to backup a file before manually changing it.

BTW, files in OOXML (e.g. '.pptx') are zip-archives too and can be handled the same way.

Pretty Printing
LibreOffice saves a file in a compact form as default, removing all not necessary line breaks and spaces. This is inconvenient if you want to view the file with a simple editor. You get very long lines, which simple editors often cannot handle. Set LibreOffice to save in ‘pretty printing’. Press to open the Options dialog. Open section and click on button. Search for ‘pretty’. You get one result. Double-click it to toggle its value to ‘true’.



Special Editors
More convenient than simple editors are tools that can recognize and work on the XML structure. For example, such tools provide syntax highlighting, add line breaks and line indents for better readability or provide collapse and expand options for the XML elements. Or they work directly with the node tree of the document. Ask in a forum or mailing list for suitable tools for your operating system.

For Windows 10 these are suitable:
 * Notepad++ with plugin ‘XML Tools’
 * Microsoft XML Notepad

/Other comfortable editors to be added./

Avoid Noisy Markup in Writer
Writer can insert additional information to get better results for document comparison. This is enabled as default, but spoils the document with markup, which is not relevant for the actual content. To disable this, open Options dialog, then open section LibreOffice Writer > Comparison. Uncheck the option ‘Store it when changing the document’.

Valid ODF Files
Errors in xml structure should be detected by the editor. For detecting errors in regard to ODF, use the validator https://odfvalidator.org/. You need to select the ODF Version there, 'auto-detect' does not work reliably.

Using ODF format version “1.3” when saving in LibreOffice should result in valid files, whereas saving in “1.3 extended” sometimes writes things in a way that are not valid but are needed to be compatible with older versions. To change the file format, go to section Load/Save > General in the Options dialog and select the file format in the ‘ODF format version’ drop-down list. Be aware, that some features of LibreOffice are only available in “1.3 extended” format and not in strict ODF “1.3”. On the other hand, such extended features are unknown to other applications.

Other Tools for ODF Files
The ODF Toolkit provides some Java tools to work with ODF files without starting LibreOffice or manually opening the zip-archive.

Analyzing Documents lists some tools for Linux.

ODF documents generation tools has a large commented list of tools.

ODF Specification
The ODF file format is specified by the standard “Open Document Format for Office Applications”. You can get it from OASIS https://www.oasis-open.org/standards/

The ODF version 1.3 is available from https://docs.oasis-open.org/office/OpenDocument/v1.3/os/ [as of Aug 2021].

The versions 1.2 of the ODF standard is available too as ISO/IEC 26300 and can be downloaded without charge from https://standards.iso.org/ittf/PubliclyAvailableStandards/index.html