Home > Net >  Regular expressions, capture groups, and the dollar sign
Regular expressions, capture groups, and the dollar sign

Time:11-20

Reading a book about bash and it was introducing regular expressions(I'm pretty new to them) with an example:

rename -n 's/(.*)(.*)/new$1$2/' *

'file1' would be renamed to 'newfile1'
'file2' would be renamed to 'newfile2'
'file3' would be renamed to 'newfile3'

There wasn't really a breakdown provided with this example, unfortunately. I kind of get what capture groups are and that .* is greedy and will match all characters but I'm uncertain as to why two capture groups are needed. Also, I get that $ represents the end of the line but am unsure of what $1$2 is actually doing here. Appreciate any insight provided.

Attempted to research capture groups and the $ for some similar examples with explanations but came up short.

CodePudding user response:

You are correct. (.*)(.*) makes no sense. The second .* will always match the empty string.

For example, matching against file,

  • the first .* will match the 4 character string starting at position 0 (file), and
  • the second .* will match the 0 character string starting at position 4 (empty string).

You could simply the pattern to

rename -n 's/(.*)/new$1/' *
rename -n 's/.*/new$&/' *
rename -n 's/^/new/' *
rename -n '$_ = "new$_"' *
rename -n '$_ = "new" . $_' *

CodePudding user response:

I don't know that rename command. The regular expression looks like sed syntax. If that is the case (as in many other regex forms), it has 3 parts:

  • s for substitute
  • everything between the first two slashes (.*)(.*) to specify what to match
  • everything between the 2nd and 3rd slash new$1$2 is the replacement

$ only mean end of the line on the first part of the regular expression. On the second part $ number refers to the capture groups, $1 is the first group, $2 the second, and so on, with $0 often being the whole matched text.

You are right that .* is greedy and it's pointless to have that repeated. Maybe there was a \. in between and that was an attempt to capture file name and extension. There are better ways to parse file names, like basename. So you could simplify the command to rename -n 's/(.*)/new$1/' *

  • Related