Home > OS >  Rearranging specific lines of text, based on a pattern
Rearranging specific lines of text, based on a pattern

Time:08-27

I have a FASTA file, in which the headers inside (the lines starting with ">") are not in the right order. What I want to do, is to take part of the text with a certain pattern (ctg.*,) and move it to the start of the text. for example:

head seq.fasta -n 3
>JACEFZ010000001.1 Cepaea nemoralis isolate C981 ctg35418, whole genome shotgun sequence
cctcctcctccctcctcccctttttCCCTccttcccctttcccccctcctcttcctccccccctcctcccccccctcctc
cttcctccgccctctcctcctcctcactcctcctcctccctcctcctcctccctctacctcctacccCCTCCTCCCGTCA

And I want to "move" the ctg35418 string to the start, where now the new file will be:

>ctg35418 JACEFZ010000001.1 Cepaea nemoralis isolate C981, whole genome shotgun sequence
cctcctcctccctcctcccctttttCCCTccttcccctttcccccctcctcttcctccccccctcctcccccccctcctc
cttcctccgccctctcctcctcctcactcctcctcctccctcctcctcctccctctacctcctacccCCTCCTCCCGTCA

Well, I'm kind of new with shell scripting, so I did something like this:

while read line; do if [[ $line =~ ">" ]]; then 
 id=$(echo $line | grep -oe "ctg.*," | sed 's/,//g')
 line2=$(echo $line | sed 's/>//g' | sed "s| ${id}||g")
 sed -i "s|$line|>${id} ${line2}|g" seq.fasta
 fi
 done < seq.fasta

I would love to get your inputs to reduce the complexity of the, lets call it, code.

CodePudding user response:

This sed command should do the job:

sed 's/>\(.*\)[[:blank:]]\(ctg[^,]*\)/>\2 \1/' seq.fasta > newseq.fasta
  • Related