Home > Software design >  How does an empty regular expression evaluate?
How does an empty regular expression evaluate?

Time:10-03

For doing something like the following:

select regexp_matches('X', '');

Is a regular expression of an empty-string defined behavior? If so, how does it normally work?

In other words, which of the following is the base production (ignoring some of the advanced constructs such as repetition, grouping, etc.)?

regex
    : atom 
    ;

Or:

regex
    : atom*
    ;

As an example:

enter image description here

regex101 shows no match for all 7 flavors, but Postgres returns true on select regexp_matches('X', '');.

CodePudding user response:

The empty regex, by definition, matches the empty string. In a substring match (which is what PostgreSQL's regex_match performs), the match always succeeds since the empty string is a substring of every string, including itself. So it's not a very useful query, but it should work with any regex implementation. (It might be more useful as a full string match, but string equality would also work and probably with less overhead.)

One aspect of empty matches which does vary between regex implementations is how they interact with the "global" (repeated application) flag or equivalent. Most regex engines will advance one character after a successful zero-length substring match, but there are exceptions. As a general rule, nullable regexes (including the empty regex) should not be used with a repeated application flag unless the result is explicitly documented by the regex library (and, for what it's worth, I couldn't find such documentation for PostgreSQL, but that doesn't mean that it doesn't exist somewhere).

  • Related