I need to modify a .fasta file that looks like this:
>Contig_1;2
AGATC...
>Contig_2;345
AaGGC...
>Contig_3;22
GGAGA...
And transform it into something like:
>Contig_1
AGATC...
>Contig_2
AaGGC...
>Contig_3
GGAGA...
I tried doing the following, but it did not work as intended.
sed -i 's/;*\n/\n/g' file.fasta
Could someone give me some advice? Thanks!
CodePudding user response:
You can use
sed -i 's/;[^;]*$//' file.fasta
See the online demo:
s='>Contig_1;2
AGATC...
>Contig_2;345
AaGGC...
>Contig_3;22
GGAGA...'
sed 's/;[^;]*$//' <<< "$s"
Output:
>Contig_1
AGATC...
>Contig_2
AaGGC...
>Contig_3
GGAGA...
Note that sed
does not place the newline into the pattern space (since you are using a GNU sed
, you could force it to do so with -z
, but it is not necessary here), and you can't match a newline with \n
in your sed command.
The ;[^;]*$
pattern matches
;
- a semi-colon[^;]*
- any zero or more chars other than;
(if you need to make sure you match digits, replace with[0-9]*
or[[:digit:]]*
)$
- end of string.
Note you need no g
flag here since this command needs to perform a single replacement per line.
CodePudding user response:
1st solution: You can keep it very simple with awk
. Written with your shown samples, in GNU awk
.
awk -F';' '/^>/{NF=1} 1' Input_file
Explanation: Simple explanation would be, setting field separator as ;
for all the lines. In main program checking condition if line starts from >
then set NF
(number of fields) to 1 which will keep values just before first occurrence of ;
.
2nd solution you want to look for all types of lines(either start with >
or not) and you need values just before first occurrence of ;
then try following solution:
awk -F';' '/;/{NF=1} 1' Input_file