Home > database >  for loop for multiple files: combine two fasta files with similar ID to one file
for loop for multiple files: combine two fasta files with similar ID to one file

Time:05-09

I have 100 fasta files in one dir:

A_n.fasta
A_t.fasta
B_n.fasta
B_t.fasta
C_n.fasta
C_t.fasta 
...

Fasta files contain different numbers of sequences and I need to put them together for each pair:

input A_n.fasta, A_t.fasta output A_nt.fasta ....

I am doing it for each pair like this:

cat A_n.fasta A_t.fasta > A_nt.fasta

but I don't want to do it manually for each id pair.

so I tried something like this:

for f in *_n.fasta ; do cat $f *_t.fasta > $f*_prot_all.fasta; done

but it's not working

for loop is new for me so probably the solution is easy, but I can not solve it.

Also, I will need it for pal2nal where I will need to put 2 files (protein multiple alignments and DNA sequences) and it will be the same work, 2 files with the same part of the name and need to output one file for each pair.

CodePudding user response:

Would you please try:

#!/bin/bash

for f in *_n.fasta; do
    t=${f/_n./_t.}
    out=${f/_n./_prot_all.}
    cat "$f" "$t" > "$out"
done

where t=${f/_n./_t.} replaces the substring _n. with _t. in the variable f assigning a new variable t to it. As for the pal2nal command, you can make use of the similar replacements of the filenames.

  • Related