I have a problem with “basename” command as follow: In my host directory I have two samples’ fastq.gz files, named as:
A29_WES_S3_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz
A30_WES_S1_R1_001.fastq.gz
A30_WES_S1_R2_001.fastq.gz
Now I need to have their basename without suffix like:
A29_WES_S3_R1_001
A29_WES_S3_R2_001
A30_WES_S1_R1_001
A30_WES_S1_R2_001
I used the bash pipeline as follow:
#!/bin/bash
FILES1=(*R1_001.fastq.gz)
FILES2=(*R2_001.fastq.gz)
read1="${FILES1[@]}"
read2="${FILES2[@]}"
Ffile=$read1
Ffileprevix=$(basename "$Ffile" .fastq.gz)
Mfile=$read2
Mfileprevix=$(basename "$Mfile" .fastq.gz)
echo $Ffileprevix
echo $Mfileprevix
exit;
But every time I just get this output:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001
Only the last file (A30) would be included in the command!
I checked my pipeline in this way:
echo $read1
echo $read2
The result:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz
Then I did:
echo $Ffile
echo $Mfile
The result:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz
So $read1, $read2, $Ffile, and $Mfile work well.
Then I put “-a” in my basename command as it will take multiple files:
Ffileprevix=$(basename -a "$Ffile" .fastq.gz)
Mfileprevix=$(basename -a "$Mfile" .fastq.gz)
But it got worse! The result was like:
A29_WES_S3_R1_001.fastq.gz A30_WES_S1_R1_001.fastq.gz .fastq.gz
A29_WES_S3_R2_001.fastq.gz A30_WES_S1_R2_001.fastq.gz .fastq.gz
Finally, I tried “for ..... do ....” command to make a loop for basename command. Again, nothing changed!!
Is there anybody can help me to obtain what I want:
A29_WES_S3_R1_001
A29_WES_S3_R2_001
A30_WES_S1_R1_001
A30_WES_S1_R2_001
CodePudding user response:
I'd leave basename
out of this entirely, but that's entirely personal preference. You could do something more like:
FILES_PATTERN_1=".*R1_001.fastq.gz"
FILES_PATTERN_2=".*R2_001.fastq.gz"
# Get FILE PATTERN 1
echo "Pattern 1:"
for FILE in $(find . | grep "${FILES_PATTERN_1}" | cut -d. -f2 | tr -d /); do
echo $FILE
done
# Get FILE PATTERN 2
echo "Pattern 2:"
for FILE in $(find . | grep "${FILES_PATTERN_2}" | cut -d. -f2 | tr -d /); do
echo $FILE
done
Output should be:
Pattern 1:
A30_WES_S1_R1_001
A29_WES_S3_R1_001
Pattern 2:
A29_WES_S3_R2_001
A30_WES_S1_R2_001
You could also play with awk
to parse things instead:
# Get FILE PATTERN 1
echo "Pattern 1:"
for FILE in $(find . | grep "${FILES_PATTERN_1}" | awk -F '[/.]' '{print $3}'); do
echo $FILE
done
There are a number of ways to approach this. If you had a lot more patterns to test you could make more use of functions here to reduce code duplication.
Also note, I'm doing this from a shell on Mac OSX, so if you're doing this from a Linux box some of these commands may need to be tweaked due to differences in output for some commands, like find
. (ex: print $1 instead of print $3)