I am trying to append a file name at the end of certain lines in many files which I am concatenating.
short example:
INPUTS:
filename (1): 1234_contigs.fasta
>NODE_STUFF
GATTACA
filename (2): 5678_contigs.fasta
>NODE_TUFF
TGTAATC
OUTPUT:
>NODE_STUFF-1234
GATTACA
>NODE_TUFF-5678
TGTAATC
The code that I am using as a scaffold for this was commandeered from another post and my most successful iterations upon it are:
for i in ./*/*contigs.fasta; do sed '/^>NODE.*/ s/$/-(basename $i _contigs.fasta)/' /g $i; done
>NODE_STUFF-(basename $i _contigs.fasta)
GATTACA
>NODE_TUFF-(basename $i _contigs.fasta)
TGTAATC
for i in ./*/*contigs.fasta; do sed s/'^>NODE.*'$/$(basename $i _contigs.fasta)\ /g $i; done
1234
GATTACA
4568
TGTAATC
While I see many similar questions I am unable to find a way to do this with only certain lines in these files (which are functionally equivalent to .txt for this example). I believe my confused results are due to errors in handling literals, but after several dozen poorly recorded attempts of pushing quotation marks around I feel more lost than found. Note that each file can contain many lines starting with >NODE which I wish to append the filename too.
CodePudding user response:
With your shown samples, please try following awk
code. We need not to use a for
loop for traversing through all the files, awk
is capable in reading all of them by itself. Simple explanation would be, looking for lines which are starting with >
if yes then printing current line followed by -
followed by current file name's value before _
else(if a line doesn't start from >
) printing current line.
awk '/^>/{file=FILENAME;sub(/_.*/,"",file);print $0"-"file;next} 1' *.fasta
OR more precisely:
awk '/^>/{file=FILENAME;sub(/_.*/,"",file);$0=$0"-"file} 1' *.fasta
CodePudding user response:
with bash and sed I'd propose:
for i in ./*/*contigs.fasta; do
n=$(basename -s _contigs.fasta "$i")
sed "s/^\(>NODE.*\)/\1-$n/" "$i"
done
CodePudding user response:
Try
for file in */*_contigs.fasta; do
filenum=${file%_contigs.fasta}
filenum=${filenum##*/}
sed -- "s/^>NODE.*\$/&-${filenum}/" "$file"
done
- See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of
${file%_contigs.fasta}
and${filenum##*/}
.