I have a list of files:
MD5.txt
8530362a272d04efef64b7f1ae0d1069 NGS35_FKDN210261811-1A_H3YNKDSX2_L4_1.fq.gz
9b34bdbbb17cdb1205035e24124fed1a NGS35_FKDN210261811-1A_H3YNKDSX2_L4_2.fq.gz
00f8f992334458383fc1a5c7b06d403e NGS35_FKDN210261811-1A_H5JY3DSX2_L4_1.fq.gz
cca0e17b3dcc1e644ec1a9a4a60a851e NGS35_FKDN210261811-1A_H5JY3DSX2_L4_2.fq.gz
I want to concatenate the 1st and 3rd files and the 2nd and 4th like this:
cat NGS35_FKDN210261811-1A_H3YNKDSX2_L4_1.fq.gz NGS35_FKDN210261811-1A_H5JY3DSX2_L4_1.fq.gz >NGS35_L4_1.fq.gz
cat NGS35_FKDN210261811-1A_H3YNKDSX2_L4_2.fq.gz NGS35_FKDN210261811-1A_H5JY3DSX2_L4_2.fq.gz >NGS35_L4_2.fq.gz
I've got so far but I can't concatenate the actual files together, rather I'm just concatenating the files names...
#get files names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp
#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)
#get lines needed
sed '1q;d' tmp > file1
sed '3q;d' tmp > file2
sed '2q;d' tmp > file3
sed '4q;d' tmp > file4
#Cat files together - this doesn't work!!
cat file1 file2 > ${ID}_L4_1.fq.gz
cat file1 file2 > ${ID}_L4_2.fq.gz
rm file* tmp
I'm sure there is probably a one liner... go easy, I'm a clinician not a bioinformatician!
CodePudding user response:
The contents of file1
, file2
etc. are just the filenames you want to concatenate together, so you need another layer of cat
in order to get to the contents.
#!/bin/bash
#get files names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp
#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)
#get lines needed
cat $(sed '1q;d' tmp) $(sed '3q;d' tmp) > ${ID}_L4_1.fq.gz
cat $(sed '2q;d' tmp) $(sed '4q;d' tmp) > ${ID}_L4_2.fq.gz
rm tmp
As an aside, I'm surprised that you're able to cat together .gz files and have the resulting file still "work", but that does work for me, so I've learned something, too.
CodePudding user response:
I have a solution, but I am sure there is a more elegant one..!
#!/bin/bash
#get file names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp
#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)
#get lines needed
file1=$(sed '1q;d' tmp)
file2=$(sed '3q;d' tmp)
file3=$(sed '2q;d' tmp)
file4=$(sed '4q;d' tmp)
#car files together
cat $file1 $file2 > ${ID}_L4_1.fq.gz
cat $file1 $file2 > ${ID}_L4_2.fq.gz
rm tmp