Home > database >  Why does the second grep command not work?
Why does the second grep command not work?

Time:08-18

I have a folder named "components" and within that folder a file name "apple"

If I cd to "components" folder and execute the following command:

ls | grep -G a*e

It works and returns apple correctly.

However, if I do not cd to components folder and execute the following command:

ls components | grep -G a*e

It does not work and returns blank. What could be the reason?

A third grep command below works fine.

ls components | grep ap

The actual filename I am grepping is complex. So I need the grep -G tag to work.

CodePudding user response:

a*e is a glob, not a regex. It's important to understand the difference.

The shell expands globs in unquoted arguments by matching the argument with available files. The * in a*e means "any sequence of characters not containing a directory separator", so it will match the filename apple (or accolade.node) as long as that file is present in the current directory. Glob matches are complete, not substring matches.

So when you execute grep a*e in a directory which contains the file apple, the shell will replace a*e with the word apple before invoking grep, making the command grep apple. If the directory also contained the file accolade.node, the shell would have put that into the command line as well; grep accolade.node apple. That's very rarely what you want to happen to grep arguments (other than filename arguments), so it's highly recommended to get into the habit of quoting arguments.

Unlike the shell, grep is based on regular expression matching. In a regular expression, * means "any number of repetitions of the previous element", so the regular expression a*e will match e, ae, aae, aaae, and so on. Since grep does substring matching (by default), those strings could be anywhere in the line being matched. That will match the e in apple, for example, but it will also match any other line which contains an e, such as electronics. (That makes it a bit surprising that ls components | grep "a*e" did not match components/apple. Perhaps there was some typing problem.)

In order to match a followed by a sequence of arbitrary characters followed by an e, you could use the regular expression a.*e (i.e. grep "a.*e" -- note the use of quotes to avoid having the shell try to expand that argument as a glob). But that will probably match too much, if you're expecting it to do the same thing as the glob a*e. You might want to add some restrictions. For example, grep -w forces the match to be complete words. And (with gnu grep, at least) you can use grep -w "a\S*e" to match a complete word which starts with a and ends with e, using the \S shortcut (any character other than whitespace).

You very rarely want to use -G, by the way, particularly since it's the default (unfortunately). Most of the time, you'll want to use grep -E in order to not have to insert backslashes throughout your pattern. Please read man 7 regex for a quick overview of regex syntax and the difference between basic and extended Posix regexes. man grep is also useful, of course.

CodePudding user response:

Unquoted, a*e is a shell glob pattern that is expanded by the shell before grep runs.

When you are in the directory, this:

ls | grep -G a*e

becomes

ls | grep -G apple

As you have a file named 'apple' this matches.

When you are not in the folder, and you run:

ls components | grep -G a*e

the shell again attempts to expand the glob pattern.

If there is any file in your current directory that matches (for example, "abalone"), then the glob will expand to that. It may expand to multiple strings if there is more than one such filename (for example, "abalone", "algae"). The command becomes something like:

ls components | grep -G abalone
ls components | grep -G abalone algae

In the first case, you will get blank output unless components directory also contains that filename.

In the second case, grep will ignore the directory entirely and attempt to find the string "abalone" inside the file "algae".

There is a third possibility: the glob fails to find anything. In this case, grep will receive the regexp a*e. The -G option to grep tell it to use BRE-style regexp. With these, a*e means "zero or more a followed by e". This is equivalent to saying "contains e".

In that case, you should see apple in your results regardless of whether you are in components or not. In a comment, you say that ls components | grep "a*e" returned nothing. As quoting should force precisely the same result as this third case, this is surprising.


Note that if you are intending to use globs you don't need grep at all:

cd components
ls a*e
ls components/a*e
  • Related