I'm trying to explain to someone why a regex that begins with the character "*"
doesn't produce any results (and as far as I know never will). A simple example of the behavior I'm trying to explain:
mburr@mint-19:~/temp/grep-test$ touch foobar.jpg
#
# match using glob wildcard works as expected
#
mburr@mint-19:~/temp/grep-test$ ls *.jpg
foobar.jpg
#
# using regex doesn't match
#
mburr@mint-19:~/temp/grep-test$ ls | grep "*.jpg"
mburr@mint-19:~/temp/grep-test$
In one sense I understand and can explain what's going on: the "*"
character in a regex is a quantifier that quantifies the pattern expression just before it. In this case there's no pattern before the "*"
so there's nothing for it to quantify, therefore no match.
But what I can't explain is how it works in this regex. Why doesn't it throw an error if it's a senseless pattern? If it's not a senseless pattern, what could it match (if anything)?
All the searches I've done are answered with something that boils down to one or more of the following:
- regexes aren't globs
- a "*" isn't a wildcard like it is to the shell
- just let the shell do the match
But none that explain why it isn't an error or what it could match, if anything.
CodePudding user response:
Well, some regex engines throw an error. E.g. the Javascript developer console in my browser complains if I enter /*/
but accepts /a*/
(JS has regex literals). The grep
regex engine also recognizes that the *
cannot possibly be the Kleene star and just treats it as an inert character (i.e. as if it were escaped).
$ echo "a" | grep "*"
# nothing
$ echo "*" | grep "*"
*
This is as specified in POSIX
9.3.3 BRE Special Characters
A BRE [basic regular expression, as used by flagless
grep
] special character has special properties in certain contexts. Outside those contexts, or when preceded by a <backslash>, such a character is a BRE that matches the special character itself. The BRE special characters and the contexts in which they have their special meaning are as follows:...
*
The <asterisk> shall be special except when used:
- ...
- As the first character of an entire BRE...
So in grep "*"
, since the asterisk appears as the first character, it is not considered special and simply matches one occurrence of itself.