I am trying to rename some files and am pretty new to regular expression. I know how to do this the long way, but am trying some code golf to shorten it up.
my file:
abc4800_12_S200_R1_001.fastq.gz
my goal:
abc4800_12_R1.fastq.gz
right now I have a two step process for renaming it:
rename 's/_S[0-9] //g' *gz
rename 's/_001//g' *gz
But I was trying to shorten this into one single line to clean it up in one go.
I was trying to use regular expression to skip over the parts in between, but dont if that is actually a possibility in this function.
rename 's/_S[0-9] _*?_001//g' *gz
Thanks for any help
CodePudding user response:
Use a capture group to preserve the middle part of the segment you're replacing.
rename 's/_S\d _(.*)_001/_$1/' *gz
CodePudding user response:
With your shown samples, please try following rename
command. I am using -n
option here which is a dry run for command, once you are Happy with output(like how files are going to rename if we run actual code) then remove -n
option from following rename
code.
rename -n 's/(^[^_]*_[^_]*)_[^_]*(_[^_]*)[^.]*(\..*$)/$1$2$3/' *.gz
Output will be as follows:
rename(abc4800_12_S200_R1_001.fastq.gz, abc4800_12_R1.fastq.gz)
Explanation: Adding detailed explanation for above.
(^[^_]*_[^_]*) ##Creating 1st capturing group which capture everything from starting to just before 2nd occurrence of _ here.
_[^_]* ##Matching(without capturing group) _ then just before next occurrence of _ here.
(_[^_]*) ##Creating 2nd capturing group here which matches _ followed by before next occurrence of _ here.
[^.]* ##Matching everything just before dot comes(not capturing here).
(\..*$) ##Creating 3rd capturing group which has dot till end of line in it.
CodePudding user response:
You are trying to replace two parts in the string with nothing. Use the alternation operator, it will match the left or the right side; replacing any match with the same replacement string (i.e. nothing):
rename 's/_S[0-9] |_001//g' *gz