Home > Back-end >  for loop writes only last result to file
for loop writes only last result to file

Time:08-19

Hi there I've been playing a bit with for loops in BASH to edit a FASTA file.

The file has 24 headers that start with '>' character, as follow:

>CP068277.2
>CP068276.2
>CP068275.2
>CP068274.2
>CP068273.2
>CP068272.2
>CP068271.2
>CP068270.2
>CP068269.2
>CP068268.2
>CP068267.2
>CP068266.2
>CP068265.2
>CP068264.2
>CP068263.2
>CP068262.2
>CP068261.2
>CP068260.2
>CP068259.2
>CP068258.2
>CP068257.2
>CP068256.2
>CP068255.2
>CP086569.2

These are actually chromosomes and I need them to be in the form of >chm1, >chm2, etc.

I wrote the following for loop:

for ((c=1; c<=24; c  )); 
  do 
    sed 's/>/>chr'"$c"' /' CHM13v2.0_no-mito.fna > CHM13v2.0_no-mito_trial.fna;
done

The output is, however, showing only >chm24 without accounting for the count operation (see below)..., anyone has any idea why?

>chr24 CP068277.2
>chr24 CP068276.2
>chr24 CP068275.2
>chr24 CP068274.2
>chr24 CP068273.2
>chr24 CP068272.2
>chr24 CP068271.2
>chr24 CP068270.2
>chr24 CP068269.2
>chr24 CP068268.2
>chr24 CP068267.2
>chr24 CP068266.2
>chr24 CP068265.2
>chr24 CP068264.2
>chr24 CP068263.2
>chr24 CP068262.2
>chr24 CP068261.2
>chr24 CP068260.2
>chr24 CP068259.2
>chr24 CP068258.2
>chr24 CP068257.2
>chr24 CP068256.2
>chr24 CP068255.2
>chr24 CP086569.2

P.S. no worries for the sequences following the >chm24, I have a way to remove them with sed; nonetheless, it would be nice to have everything done in one step

Thanks in advance!

CodePudding user response:

Your loop is overwriting the output file on each iteration, the syntax for what you're trying to do would be:

for ((c=1; c<=24; c  )); 
  do 
    sed 's/>/>chr'"$c"' /' CHM13v2.0_no-mito.fna
done  > CHM13v2.0_no-mito_trial.fna

but this would be orders of magnitude more efficient and doesn't hard-code how many header lines you hope the file contains:

awk 'sub(/>/,""){$0=">chr" (  c) " " $0} 1' CHM13v2.0_no-mito.fna > CHM13v2.0_no-mito_trial.fna

CodePudding user response:

In each iteration of the loop, you store the output to CHM13v2.0_no-mito_trial.fna, overwriting the file. So, that file will only see the last iteration.

If you want all iterations, try replacing that line with:

sed 's/>/>chr'"$c"' /' CHM13v2.0_no-mito.fna >> CHM13v2.0_no-mito_trial.fna;

If you want each line to have $c placed on only that line. Try changing the sed edit to edit only that line, eg:

 sed ${c},${c}'s/>/>'"${c}"'/'

But, you will need to deal with not appending the unmatching lines to the output file.

  • Related