Following the short docs on regexp_matches
:
Return all captured substrings resulting from matching a POSIX regular expression against the string.
Example: regexp_matches('foobarbequebaz', '(bar)(beque)') returns {bar,beque}
With that in mind, I'd expect the result of regexp_matches('barbarbar', '(bar)')
to be {bar,bar,bar}
However, only {bar}
is returned.
Is this the expected behavior? Am I missing something?
Note:
calling regexp_matches('barbarbar', '(bar)', 'g')
does return all 3 bar
s, but in table form:
regexp_matches text[] |
---|
{bar} |
{bar} |
{bar} |
CodePudding user response:
This behavior is described more in details in 9.7.3. POSIX Regular Expressions :
The regexp_matches function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in Table 9.24, plus the g flag which commands it to return all matches, not just the first one.
CodePudding user response:
This is expected behavior. The function returns a set of text[]
which means that multiple matches are presented in multiple rows. Why is it organized this way? The goal is to make it possible to find more than one token from a single match. In this case, they are presented in the form of an array. The documentation delivers a telling example:
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b] )(b[^b] )', 'g');
regexp_matches
----------------
{bar,beque}
{bazil,barf}
(2 rows)
The query returns two matches, each of them containing two tokens found.