Faq/General/045

LibreOffice and XML
You've with no-doubt often heard of XML, heard that the LibreOffice file format was XML, but what exactly is it and especially what it means, what in the end is XML?

What is XML?
This acronym stands for eXtensible Markup Language.

XML is part of the class metalanguages, it ​​is actually a subset of SGML. Okay, so what is the SGML (Standard Generalized Markup Language)? It is a language that allows you to create structured and modular documents. Concretely this means that you can create a document from data as diverse as sound, data from a database or text and images. SGML has been defined for a wide range of applications and must be kept on a very long term. This language is quite extensive and complex, the XML was created to simplify SGML.

What can be confusing at first, is that the HTML is very similar to XML. It is a language with tags that open and close. Big difference though, HTML was defined with tags that describe the presentation, while the XML tags deal with the content. But yes, HTML is a subset of SGML, but only for processing data on the Web.

If one summarizes the XML is a clear way (and not a programming language or a new HTML) for structuring, describing and exchanging data. The content and the layout are separated to permit the exchange. The content can then be reused in other forms or other media. Data can be of all kinds: chemical formulas, financial data, music data, text ...

An example of the xml content of a LibreOffice document :

You see the tags <> distinguish the document content. An open tags must always be closed:  will be completed by   To add information to the document, we give it attributes, always followed by a value. The  attribute has the   (document has 17 pages). We can also put comments like this: contains information about the document (author, date of last backup). The images are saved in the native format inside the zip. Now you understand why LibreOffice files take up so little space on your hard disk!
 * The  contains the styles used in the document
 * contains the main document content (text, pictures, graphics ...)
 * , usually specific to an application, contains some parameters such as the selected printer...
 * The contains additional information about other files (such as the MIME type or encryption).

To learn more about LibreOffice file formats, see http://xml.openoffice.org website.

Note that an XML document can not be displayed as it is, it needs a transformation language (XSLT) and a formatting language for objects (XSL-FO) to be able to display all the information correctly (eg need: if your document is made up of various parties organized separately or if it contains a table with formatting ...)

DTD Document Type definition
The DTD will specify which elements and attributes will be used in the XML document and will describe its structure and content. The DTD can be internal to the document (it includes the definitions in the document itself) or external (it is located in a local file or called by URL). The XML standard does not mandate the DTD, and we talk about XML document if it has a valid DTD or XML document well-formed if it does not include (but he respects the XML standard).

Example of content of LibreOffice DTD

List the initial writers in this form (with initial="true"). Initial Writers is a term from the PDL (License). If one of these authors assigned copyright to somebody else, e.g. the company they are working for, use the attribute copyright="..." and name the copyright holder. -->     <!ELEMENT authors ( author+ ) > <!ELEMENT author (#PCDATA)> <!ATTLIST author id     ID    #REQUIRED initial CDATA #IMPLIED email  CDATA #REQUIRED> A schema also defines the structure of an XML document, however, it is more flexible than a DTD to define element types. It uses the acquired DTD to define the models. Schemes are subject to a W3C specification.

XSLT : eXtensible Style Language Transformation
It is a programming language for transforming XML documents into other forms, such as RTF, HTML or PDF, etc.. This language is declarative and non-procedural (no algorithm), which makes it easier to access for non-developers. Itself written in XML, it can be reinterpreted. In fact, XSLT transforms the XML tree into rules of models describing a style sheet. It is often compared to CSS because as CSS it produces such rules, the order of appearance of these rules does not matter in the document, it also has priority when multiple rules may apply. The difference is that if a CSS is empty, it will not affect the html generated document, it will simply be ignored being only a layout, while the XSLT will generate an empty document (actually, this is not entirely true because the specification has defined internal rules still present, so the document is never really empty. :)

XPATH
This is the declarative language associated to XSLT that allow to define paths location within the XML tree.

XSL-FO : eXtensible Style Language for Formating Object
Language for defining generic objects presentation (such as lists, tables ...) often used for PDF output. It comes after an XSLT transformation and allows a visual interpretation of the processed objects.

OASIS
Oasis (Organisation for the Advancement of Structured Information Standard) is a global consortium that drives the development, convergence and adoption of standards for e-business. It therefore defines the XML standard. This definition is based on the LibreOffice file formats, they are open and consistent with a published DTD. More information can be found here: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office

LibreOffice file format is the same as OpenOffice.org and is also used by Calligra, Google Docs, and Zoho.