How do I concatenate samples where there is a pair of each sample, all in one command?
This is how my data looks like:
S10_L001_R1_001.1.fq.gz
S10_L001_R1_001.2.fq.gz
S10_L001_R1_001.rem.1.fq.gz
S10_L001_R1_001.rem.2.fq.gz
S11_L001_R1_001.1.fq.gz
S11_L001_R1_001.2.fq.gz
S11_L001_R1_001.rem.1.fq.gz
S11_L001_R1_001.rem.2.fq.gz
And so on. The idea is to concatenate "S10 ... .1" with "S10 ... rem.1", and "S10 ... .2" with "S10 ... rem.2", and then the same for sample 11, etc.
Thanks!
I tried doing it manually but that takes a lot of time considering I have 600 samples.
CodePudding user response:
you can try something like:
for prefix in $(ls | cut -d_ -f1 | sort -u);
do
cat ${prefix}_*1.fq.gz ${prefix}_*.rem.1.fq.gz > ${prefix}_concat_1.fq.gz;
cat ${prefix}_*2.fq.gz ${prefix}_*.rem.2.fq.gz > ${prefix}_concat_2.fq.gz;
done