Home > Back-end >  Appending filename at the end of certain lines in a text file
Appending filename at the end of certain lines in a text file

Time:05-26

I am trying to append a file name at the end of certain lines in many files which I am concatenating.

short example:

INPUTS:

filename (1): 1234_contigs.fasta
>NODE_STUFF
GATTACA

filename (2): 5678_contigs.fasta
>NODE_TUFF
TGTAATC

OUTPUT:

>NODE_STUFF-1234
GATTACA
>NODE_TUFF-5678
TGTAATC

The code that I am using as a scaffold for this was commandeered from another post and my most successful iterations upon it are:

for i in ./*/*contigs.fasta; do sed '/^>NODE.*/ s/$/-(basename $i _contigs.fasta)/' /g $i; done

>NODE_STUFF-(basename $i _contigs.fasta)
GATTACA
>NODE_TUFF-(basename $i _contigs.fasta)
TGTAATC


for i in ./*/*contigs.fasta; do sed s/'^>NODE.*'$/$(basename $i _contigs.fasta)\ /g $i; done
1234 
GATTACA
4568 
TGTAATC

While I see many similar questions I am unable to find a way to do this with only certain lines in these files (which are functionally equivalent to .txt for this example). I believe my confused results are due to errors in handling literals, but after several dozen poorly recorded attempts of pushing quotation marks around I feel more lost than found. Note that each file can contain many lines starting with >NODE which I wish to append the filename too.

CodePudding user response:

With your shown samples, please try following awk code. We need not to use a for loop for traversing through all the files, awk is capable in reading all of them by itself. Simple explanation would be, looking for lines which are starting with > if yes then printing current line followed by - followed by current file name's value before _ else(if a line doesn't start from >) printing current line.

awk '/^>/{file=FILENAME;sub(/_.*/,"",file);print $0"-"file;next} 1' *.fasta

OR more precisely:

awk '/^>/{file=FILENAME;sub(/_.*/,"",file);$0=$0"-"file} 1' *.fasta

CodePudding user response:

with bash and sed I'd propose:

for i in ./*/*contigs.fasta; do
   n=$(basename -s _contigs.fasta "$i")
   sed "s/^\(>NODE.*\)/\1-$n/" "$i"
done

CodePudding user response:

Try

for file in */*_contigs.fasta; do
    filenum=${file%_contigs.fasta}
    filenum=${filenum##*/}

    sed -- "s/^>NODE.*\$/&-${filenum}/" "$file"
done
  • Related