Documentation/Calc Functions/FILTERXML

Function name:
FILTERXML

Category:
Text

Summary:
Applies an XPath expression to a string that contains structured data encoded in XML format, enabling you to easily extract data of interest from within the XML markup.

Extensible Markup Language (XML) is a widely used markup language that can support the storage and transport of data using a syntax that both humans and computers can read. It supports the exchange of a wide variety of data on the Internet and in many other environments.

XML Path Language (XPath) is a query language that uses path expressions (with a non-XML syntax) to select parts of an XML document.

The power of the FILTERXML function is increased when used in an array formula, either by using the key combination Shift+Control+ Enter (instead of just Enter) or by ticking the Array checkbox when using the Function Wizard. This aspect of FILTERXML’s behavior is similar to that of the functions in Calc’s Array category.

Syntax:
FILTERXML(XML document; XPath expression)

Returns:
Returns a text string (or multiple text strings if used in an array formula) which contain(s) the extracted data of interest.

Arguments:
XML document is a text string (in quotation marks), or a reference to a cell containing a text string, that contains the XML markup to be filtered. The term XML document is used because it appears in Calc’s Function Wizard and is not intended to refer to a separate document file – the entire XML document is contained in the string argument or referenced cell.

XPath expression is a text string (in quotation marks), or a reference to a cell containing a text string, that contains the XPath expression to be used.


 * If either XML document or XPath expression is an empty string, then FILTERXML reports a #VALUE! error.
 * If XML document does not contain valid XML, then FILTERXML reports a #VALUE! error.
 * If XPath expression does not contain a valid XPath sequence, then FILTERXML reports a #VALUE! error.
 * If both XML document and XPath expression are valid but no matching elements are found, then FILTERXML reports a value not available error (#N/A).

Additional details:
To use FILTERXML effectively may require a little further information about XML and XPath. The function can be used without in-depth knowledge of these enabling technologies but improving your knowledge in these areas will help you get the best out of FILTERXML.

We recommend that you start with the following links:
 * Wikipedia’s XML page.
 * The World Wide Web Consortium's (W3C) XML pages.
 * Wikipedia’s XPath page.
 * The World Wide Web Consortium's (W3C) XPath pages.

In addition, there are several sites offering free and useful tutorials on both XML and XPath.

Example of basic operation
Consider the following XML markup, which contains the fictitious names and email addresses of the three members of some imaginary club:


 * John Doe
 * sbriacc2@nasigoreng.buzz
 * Jane Doe
 * 7jobelso.57c@bacharg.com
 * Ann Other
 * 2kingmuslehh@elhida.com
 * Jane Doe
 * 7jobelso.57c@bacharg.com
 * Ann Other
 * 2kingmuslehh@elhida.com
 * Ann Other
 * 2kingmuslehh@elhida.com

The XML above is laid out nicely, for ease of understanding. Copy the whole of this code and insert it as text into cell A1 of an empty Calc worksheet. This process will probably lose some of the code's formatting and leave some white space between the XML tags. This should not make any difference to FILTERXML's behavior.

Enter the formula  into an empty cell on the same sheet. FILTERXML returns the string "John Doe", the name of the first member in the list of members. The leading "/" of this XPath string indicates an absolute location path, starting from the root of the XML document.

Enter the formula  into an empty cell on the same sheet. FILTERXML returns the string "Jane Doe", the name of the second member in the list of members. The leading "//" of this Xpath string indicates a relative location path, so that the required nodes may lie anywhere in the XML document.

The sub-string "[2]" in the XPath string of the previous example is an example of a predicate. XPath allows you to use predicates to find a specific node or a node containing a specific value. The following are some further examples, with results.

Enter the formula  into an empty cell on the same sheet. FILTERXML returns the string "7jobelso.57c@bacharg.com", the email address of the second member in the list of members. The sub-string "[last-1]" is a predicate that selects the penultimate member element that is the child of the members element.

Finally enter the formula  into an empty cell on the same sheet. FILTERXML returns the string "Ann Other 2kingmuslehh@elhida.com", giving all available information (name and email address) about the third member in the list. The string "[position>2]" is a predicate that will not match the first and second member elements, but will match the third and subsequent member elements.

Array example
Cell A1 should contain the same XML markup as it did in the sub-section above.

Select three empty cells, which are adjacent and lie in the same column. For the purposes of discussion in this example, we will assume that these are cells A3, A4, and A5.

Go to the Input line and enter the formula  but instead of simply pressing Enter on completing entry of the formula, instead press Shift+Control+Enter. This enters an array formula and FILTERXML puts the name of the first member in cell A3, the name of the second member in cell A4, and the name of the third member in cell A5.

Since this XML code contains information about only three members, if you initially selected four cells by mistake (A3:A6) then cell A6 would end up showing a value not available error (#N/A).

Using XML data from Internet sources via the WEBSERVICE function
The Documentation Foundation's wiki provides an area for press releases of LibreOffice information. This blog has an associated RSS (Really Simple Syndication) feed which can be used to demonstrate the use of the FILTERXML because RSS is only plain text structured using XML. Calc's WEBSERVICE function can be used to access this feed and by combing the two functions, the data retrieval becomes very powerful.

At the time of writing this example, the formula  returns the string "Press Releases – The Document Foundation Blog" (although the content of this string could vary in future).

Store the URL "https://blog.documentfoundation.org/blog/category/press-releases/feed/" in an empty cell, for example cell D1. Then the formula  entered in a different empty cell returns a string giving the publication date of the most recent press release (for example, "Thu, 13 May 2021 12:00:27 +0000").

In a similar way, it is also possible to access information on recent changes to The Document Foundation's wiki. For example, the formula  returns a string giving the last build date of the wiki (for example, "Sat, 05 Jun 2021 10:01:26 GMT").

The WEBSERVICE function returns the #VALUE! error when it cannot retrieve / fetch any data for the given URL. Consider using the ISERROR function to check the response from WEBSERVICE for errors before passing it on to FILTERXML.

Related LibreOffice functions:
ENCODEURL

WEBSERVICE

ODF standard:
None.

Equivalent Excel functions:
FILTERXML