Home > Software engineering >  postgres regexp_matches strange behavior
postgres regexp_matches strange behavior

Time:11-09

Following the short docs on regexp_matches:

Return all captured substrings resulting from matching a POSIX regular expression against the string.
Example: regexp_matches('foobarbequebaz', '(bar)(beque)') returns {bar,beque}

With that in mind, I'd expect the result of regexp_matches('barbarbar', '(bar)') to be {bar,bar,bar}

However, only {bar} is returned.

Is this the expected behavior? Am I missing something?


Note: calling regexp_matches('barbarbar', '(bar)', 'g') does return all 3 bars, but in table form:

regexp_matches text[]
{bar}
{bar}
{bar}

CodePudding user response:

This behavior is described more in details in 9.7.3. POSIX Regular Expressions :

The regexp_matches function returns a set of text arrays of captured substring(s) resulting from matching a POSIX regular expression pattern to a string. It has the same syntax as regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the pattern, just as described above for regexp_match. regexp_matches accepts all the flags shown in Table 9.24, plus the g flag which commands it to return all matches, not just the first one.

CodePudding user response:

This is expected behavior. The function returns a set of text[] which means that multiple matches are presented in multiple rows. Why is it organized this way? The goal is to make it possible to find more than one token from a single match. In this case, they are presented in the form of an array. The documentation delivers a telling example:

SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b] )(b[^b] )', 'g');
 regexp_matches
----------------
 {bar,beque}
 {bazil,barf}
(2 rows)

The query returns two matches, each of them containing two tokens found.

  • Related