Talk:Documentation/Calc Functions/REGEX

Updated to address lessons learned during GSOD 2020 and address my previous comments. --Stevefanning (talk) 2021-07-10T19:34:24 (UTC)

SF Comments

 * (1) Summary. For consistency with other pages, the words “The function” could probably be deleted.
 * (2) Returns. “on the Text using some Expression” is probably redundant and could be deleted.
 * (3) Arguments, Text. Suggest replacing with the following – “Text is a string or a reference to a cell containing that string, which is to be processed using the specified regular expression.”.
 * (4) Arguments, Expression. Suggest replacing with the following text – “Expression is a string or a reference to a cell containing that string, which is the required search pattern in the form of a regular expression.”. I recommend putting the (revised) ICU reference in the Additional details section; see comment below. Also, I think it would be more consistent with other wiki pages to group the “no match” and error conditions in bullets at the end of this section.
 * (5) Arguments, Replacement. Suggest replacing with the following text – “Replacement is a string or a reference to a cell containing that string, which is the text to be used to replace matched text.”.
 * (6) Arguments, Flags|Occurrence. I suspect it would be slightly easier to describe these as one entity, as follows – “Flags|Occurrence is a string in quotation marks or a positive integer, or a reference to a cell containing one of those types. If equal to the string "g" (which indicates global), then REGEX replaces all matches of Expression in Text with Replacement. If equal to a positive integer n, then REGEX replaces only the n-th match of Expression in Text with Replacement. If this argument is not supplied, the default value of 1 is applied.”
 * (7) Arguments, proposed new 1st bullet (Replaces existing 1st bullet). “If only the first two arguments are supplied, the function returns either the value of the first sub-string that has matched the search pattern or the #N/A error if no match is found.”.
 * (8) Arguments, proposed new 2nd bullet. “If only the first three arguments are supplied, the function replaces the first sub-string that has matched the search pattern with the specified replacement text and returns the modified version of the entire original string. If no match is found, the function returns the original string unchanged.”
 * (9) Arguments, proposed new 3rd bullet. “If all four arguments are supplied, the function replaces the selected sub-string(s) that have matched the search pattern with the specified replacement text and returns the modified version of the original string. If no match is found, the function returns the original string unchanged.”
 * (10) Arguments, proposed new 4th bullet. “If the Text, Expression and Flags|Occurrence arguments are supplied (with Replacement omitted), then Flags|Occurrence should be a positive integer. In this case the function extracts the n-th match of Expression in Text”.
 * (11) Arguments, proposed new 5th bullet. “If the Flags|Occurrence argument is a text string other than “g” then the function returns an invalid argument error (currently no other flags are implemented)”.
 * (12) Arguments, proposed new 6th bullet. “If the Flags|Occurrence argument is a negative number then the function again returns an invalid argument error”.
 * (13) Arguments, proposed new 7th bullet. If the Flags|Occurrence argument is a non-negative real number, then the value is truncated to an integer value”.
 * (14) Arguments, proposed new 8th bullet. If after truncation the Flags|Occurrence argument is set to 0 or set to a value that is greater than the number of matches found, the function returns an unmodified copy of the text that was in the Text argument.”
 * (15) Additional details. I think it would be worth replacing the existing text with three links. (1) A link that provides general background information on regular expressions, such as the Wikipedia page at https://en.wikipedia.org/wiki/Regular_expression. (2) A statement that the implementation of regular expressions in Calc follows the ICU's (International Components for Unicode) Regular Expressions and more information can be found at https://unicode-org.github.io/icu/userguide/strings/regexp.html. (3) A statement that more general information about regular expressions in Calc can be found by visiting https://help.libreoffice.org and entering the term “regular expressions” into the Index field.
 * (16) Additional details. Would it be worth pointing out that the behaviour of the REGEX function is not affected by settings in the Formulas wildcards area of the Tools > Options > LibreOffice Calc > Calculate dialog?
 * (17) Additional details. Would it be worth pointing out that the original string passed in the Text argument is not affected by any replacements made be REGEX – changes are reflected only in the returned string.
 * (18) Additional details. Please include mention of capture groups. Maybe the following text – “Calc’s regular expression engine supports numbered capture groups, which allow sub-ranges within a match to be identified and used within replacement text. Parentheses are used to group components of a regular expression together and create a numbered capture group. To insert a capture group into a replacement text, use the “$n” form, where n is the number of the capture group. An example is given in the Examples section below.”.
 * (19) Additional details. We should maybe say something about case sensitivity. As far as I can tell, REGEX matches are by default case sensitive. It is possible to put an appropriate flag “(?i)” inside the regular expression string to make an individual match case insensitive. An example is given in the Examples section below.
 * (20) Examples, general. It would be worth including an example using just the first two arguments but providing a match. This form may be useful for checking that a string conforms to some predefined pattern (e.g., telephone number, email address, etc).
 * (21) Examples, general. It would be worth including an example of using a case insensitive match by including the appropriate flag in the regular expression string. For example - =REGEX("LibreOffice Calc","(?i)libreoffice") gives a match and returns “LibreOffice” as the matched sub-string.
 * (22) Examples, 5th example. While this example is correct it might cause some confusion because the third argument is a number rather than a string. Please include some words that reflect that Calc will interpret a value like this to be equivalent to the same value in quotation marks.
 * (23) Examples, final two examples. Should “Expression” be in bold?

--Stevefanning (talk) 2020-12-25T18:10:26 (UTC)