Home > Net >  Concatenating every other file from a list of files
Concatenating every other file from a list of files

Time:09-17

I have a list of files:

MD5.txt

8530362a272d04efef64b7f1ae0d1069  NGS35_FKDN210261811-1A_H3YNKDSX2_L4_1.fq.gz
9b34bdbbb17cdb1205035e24124fed1a  NGS35_FKDN210261811-1A_H3YNKDSX2_L4_2.fq.gz
00f8f992334458383fc1a5c7b06d403e  NGS35_FKDN210261811-1A_H5JY3DSX2_L4_1.fq.gz
cca0e17b3dcc1e644ec1a9a4a60a851e  NGS35_FKDN210261811-1A_H5JY3DSX2_L4_2.fq.gz

I want to concatenate the 1st and 3rd files and the 2nd and 4th like this:

cat NGS35_FKDN210261811-1A_H3YNKDSX2_L4_1.fq.gz NGS35_FKDN210261811-1A_H5JY3DSX2_L4_1.fq.gz >NGS35_L4_1.fq.gz

cat NGS35_FKDN210261811-1A_H3YNKDSX2_L4_2.fq.gz NGS35_FKDN210261811-1A_H5JY3DSX2_L4_2.fq.gz >NGS35_L4_2.fq.gz

I've got so far but I can't concatenate the actual files together, rather I'm just concatenating the files names...

#get files names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp

#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)

#get lines needed
sed '1q;d' tmp > file1
sed '3q;d' tmp > file2

sed '2q;d' tmp > file3
sed '4q;d' tmp > file4

#Cat files together - this doesn't work!!
cat file1 file2 > ${ID}_L4_1.fq.gz
cat file1 file2 > ${ID}_L4_2.fq.gz

rm file* tmp

I'm sure there is probably a one liner... go easy, I'm a clinician not a bioinformatician!

CodePudding user response:

The contents of file1, file2 etc. are just the filenames you want to concatenate together, so you need another layer of cat in order to get to the contents.

#!/bin/bash

#get files names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp

#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)

#get lines needed
cat $(sed '1q;d' tmp) $(sed '3q;d' tmp) > ${ID}_L4_1.fq.gz
cat $(sed '2q;d' tmp) $(sed '4q;d' tmp) > ${ID}_L4_2.fq.gz

rm  tmp

As an aside, I'm surprised that you're able to cat together .gz files and have the resulting file still "work", but that does work for me, so I've learned something, too.

CodePudding user response:

I have a solution, but I am sure there is a more elegant one..!

#!/bin/bash

#get file names
awk -v OFS="\t" '$1=$1' MD5.txt | cut -f2 > tmp

#get sample ID
ID=$(sed 's/_.*//' tmp | head -1)

#get lines needed
file1=$(sed '1q;d' tmp)
file2=$(sed '3q;d' tmp)

file3=$(sed '2q;d' tmp)
file4=$(sed '4q;d' tmp)

#car files together
cat $file1 $file2 > ${ID}_L4_1.fq.gz
cat $file1 $file2 > ${ID}_L4_2.fq.gz

rm tmp
  •  Tags:  
  • unix
  • Related