Home > Back-end >  Metacharacters in bash commands: Can't find a clear pattern on how it works and how to protect
Metacharacters in bash commands: Can't find a clear pattern on how it works and how to protect

Time:11-19

I'm trying to find a "clear pattern" of the use of metacharacters in bash commands. Let me be clear!

In the following cases, we have the same results... [] is always intepreted as a metacharacter. But why? " " or ' ' should protect [] from interpretation.

grep \[Aa\] filename # it will print all lines with A or a characters
grep [Aa] filename   # the same
grep "[Aa]" filename # the same
grep '[Aa]' filename # the same

On the other hand, being "redundant" as it follows, the metacharacters [] are protected, i.e., they are not interpreted and are taken as literal characters, fixed strings.

grep "\[Aa\]" filename  # it will print all line with [Aa] literal characters
grep '\[Aa\]' filename  # the same
grep -F "[Aa]" filename # the same

Why does it happen? " ", \ and ' ' should protect any metacharacter from interpretation but in grep command it doesn't work! So the "rules" change depending on bash commands? I'm confused about this.

For exemple, using find command to find a file with the name filename:

find $DIRECTORY_PATH -name filename

All metacharacters in filename must be protected... it doesn't make any sense: so we need to protect the metacharacters in order to interpret them, to take them as metacharacters and not literal ones?

CodePudding user response:

I think the biggest thing that's confusing you is that the argument goes through two distinct processing steps:

  1. bash applies its various parsing rules (applying and removing quotes and escapes, expanding filename wildcards, etc), and then passes the result of that to grep as arguments.
  2. grep interprets whatever it got according to its rules, which are very different; escapes have a similar function (but quotes are just ordinary characters), and some of the things you might want to escape are the same, but many aren't.

When you use quotes and/or escapes, some'll get applied (and removed) by bash, and some by grep -- but not the same ones, and it's important to understand which are applying at which step.

Also, the two steps are done by different programs (bash vs grep), and they don't know anything about each other. bash has no idea that the first argument to grep will be treated as a regular expression (unless you passed -F), and grep doesn't know what processing the argument went through before it received it.

To help differentiate what's happening in the two steps, you can create a command that just shows the arguments it got, and use that instead of grep. Here's an example using a bash function:

$ printargs() {
> if (( $# > 0 ))
> then printf '   «%s»\n' "$@"
> else echo "No arguments"
> fi
> }
$ printargs \[Aa\] filename
   «[Aa]»
   «filename»
$ printargs "[Aa]" filename
   «[Aa]»
   «filename»
$ printargs '[Aa]' filename
   «[Aa]»
   «filename»

Note that in each case bash applied and removed the quotes/escapes, rather than passing them on to the command. If the command had been grep rather than printargs, it similarly would've just received [Aa] with no quotes or escapes.

So what did they do? Let's try it with no quotes or escapes:

$ printargs [Aa] filename
   «[Aa]»
   «filename»
$ touch a A
$ printargs [Aa] filename
   «a»
   «A»
   «filename»

Without quotes or escapes, bash treats [Aa] as a filename wildcard, and tries to expand it to a list of matching filenames. This isn't a filename argument, but bash doesn't know or care about that; it always does this to arguments that contain wildcard characters (including [...]) that aren't disabled by quotes or escapes. You didn't have any matching files, so bash just passed it on as it was, but if matching files exist... it'll get replaced. This is why it's a good idea to quote any argument that contains any funny characters that you don't want bash messing with.

Now with "redundant" quoting escaping:

$ printargs "\[Aa\]" filename
   «\[Aa\]»
   «filename»
$ printargs '\[Aa\]' filename
   «\[Aa\]»
   «filename»

Here, you can see that bash applied and removed the quotes, but since the escapes were inside (and therefore protected by) them, it didn't apply or remove them. the escapes are passed on to the command, and if the command had been grep it would've treated them as disabling the special meaning of [ and ] in the regular expression.

BTW, escapes inside double-quotes aren't always passed on to the command; there are some characters that still have special meanings inside double-quotes, and if those are escaped bash will apply and remove the escape characters:

$ printargs "quote: \", backtick: \`, dollar: \$, etc"
   «quote: ", backtick: `, dollar: $, etc»

When passing regular expressions (or anything similarly complex), it's usually best to put single-quotes around them, so bash doesn't mess with the contents (aside from removing the single-quotes). The main exception is if you actually want to pass single-quotes as part of the argument (rather than as protectors around it); that's more complicated.

  • Related