I feel like this is a lame question, but after a lot of attempts, I'm stuck. I have a large number of files like this:
S2EC1_DKDL220005480-2a-AK13554-7UDI265_HHJ2MCCX2_L8_1.fq.gz
S2EC1_DKDL220005480-2a-AK13554-7UDI265_HHJ2MCCX2_L8_2.fq.gz
S2EC2_DKDL220005480-2a-5UDI249-7UDI265_HHJ2MCCX2_L8_1.fq.gz
S2EC2_DKDL220005480-2a-5UDI249-7UDI265_HHJ2MCCX2_L8_2.fq.gz
S2EC11_DKDL220005480-2a-5UDI251-5UDI1063_HHJ2MCCX2_L8_1.fq.gz
S2EC11_DKDL220005480-2a-5UDI251-5UDI1063_HHJ2MCCX2_L8_2.fq.gz
and I'm trying to get them renamed to look like this:
S2EC1_R1.fastq.gz
S2EC1_R2.fastq.gz
S2EC2_R1.fastq.gz
S2EC2_R2.fastq.gz
S2EC11_R1.fastq.gz
S2EC11_R2.fastq.gz
The filenames are variable length. There is a bit that is identical in every filename DKDL220005480-2a-
and _HHJ2MCCX2_L8
but it's separated by a bit in the middle that is variable in terms of composition and length.
From my bash shell I can make some progress in a kind of a step-wise fashion by doing this to get rid of the constant text:
for x in *; do mv $x ${x/DKDL220005480-2a-/}; done
for x in *; do mv $x ${x/_HHJ2MCCX2_L8_/_R}; done
Which yields file names like this:
S2EC1_AK13554-7UDI265_R1.fq.gz
S2EC1_AK13554-7UDI265_R2.fq.gz
S2EC2_5UDI249-7UDI265_R1.fq.gz
S2EC2_5UDI249-7UDI265_R2.fq.gz
S2EC11_5UDI251-5UDI1063_R1.fq.gz
S2EC11_5UDI251-5UDI1063_R2.fq.gz
But now I'm failing to find and replace the variable parts in the middle. Of course it would also be much more elegant to do it all in one go.
Here is what I consider my most promising code for matching that variable middle bit:
for x in *; do mv $x ${x/_(. )_/}; done
But I get this error:
mv: 'S2EC1_AK13554-7UDI265_R1.fq.gz' and 'S2EC1_AK13554-7UDI265_R1.fq.gz' are the same file
mv: 'S2EC1_AK13554-7UDI265_R2.fq.gz' and 'S2EC1_AK13554-7UDI265_R2.fq.gz' are the same file
mv: 'S2EC2_5UDI249-7UDI265_R1.fq.gz' and 'S2EC2_5UDI249-7UDI265_R1.fq.gz' are the same file
mv: 'S2EC2_5UDI249-7UDI265_R2.fq.gz' and 'S2EC2_5UDI249-7UDI265_R2.fq.gz' are the same file
mv: 'S2EC11_5UDI251-5UDI1063_R1.fq.gz' and 'S2EC11_5UDI251-5UDI1063_R1.fq.gz' are the same file
mv: 'S2EC11_5UDI251-5UDI1063_R2.fq.gz' and 'S2EC11_5UDI251-5UDI1063_R2.fq.gz' are the same file
Not sure if it's something wrong with my regular expression or my mv code (or both or even possibly something else, ha ha).
Thanks
CodePudding user response:
Pattern matching and regular expressions are two different things. In pattern matching *
means any string. In regular expressions it means zero or more of what precedes. In pattern matching (. )
means... the literal (. )
string. In regular expressions it represents a capture group with at least one character.
For your simple renaming scheme you can try:
for f in *.fq.gz; do
g="${f/_DKDL220005480-2a-*_HHJ2MCCX2_L8_/_R}"
printf 'mv "%s" "%s"\n' "$f" "${g%.fq.gz}.fastq.gz"
# mv "$f" "${g%.fq.gz}.fastq.gz"
done
Once satisfied with the output uncomment the mv
line.