Home > Enterprise >  What does a regex that begins with "*" actually do?
What does a regex that begins with "*" actually do?

Time:03-25

I'm trying to explain to someone why a regex that begins with the character "*" doesn't produce any results (and as far as I know never will). A simple example of the behavior I'm trying to explain:

mburr@mint-19:~/temp/grep-test$ touch foobar.jpg

#
# match using glob wildcard works as expected
#
mburr@mint-19:~/temp/grep-test$ ls *.jpg
foobar.jpg

#
# using regex doesn't match
#
mburr@mint-19:~/temp/grep-test$ ls | grep "*.jpg"
mburr@mint-19:~/temp/grep-test$ 

In one sense I understand and can explain what's going on: the "*" character in a regex is a quantifier that quantifies the pattern expression just before it. In this case there's no pattern before the "*" so there's nothing for it to quantify, therefore no match.

But what I can't explain is how it works in this regex. Why doesn't it throw an error if it's a senseless pattern? If it's not a senseless pattern, what could it match (if anything)?

All the searches I've done are answered with something that boils down to one or more of the following:

  • regexes aren't globs
  • a "*" isn't a wildcard like it is to the shell
  • just let the shell do the match

But none that explain why it isn't an error or what it could match, if anything.

CodePudding user response:

Well, some regex engines throw an error. E.g. the Javascript developer console in my browser complains if I enter /*/ but accepts /a*/ (JS has regex literals). The grep regex engine also recognizes that the * cannot possibly be the Kleene star and just treats it as an inert character (i.e. as if it were escaped).

$ echo "a" | grep "*"
# nothing
$ echo "*" | grep "*"
*

This is as specified in POSIX

9.3.3 BRE Special Characters

A BRE [basic regular expression, as used by flagless grep] special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:

...

*

The <asterisk> shall be special except when used:

  • ...
  • As the first character of an entire BRE...

So in grep "*", since the asterisk appears as the first character, it is not considered special and simply matches one occurrence of itself.

  • Related