I have a dataset of fasta file which looks like this :
13_seq2344_ATCGACGGAACTGA
1342_seq2134_AGCTGTGGCAT
130_SEQ2289_TCGAATCGAGGAAC
I want to remove the line which contains "13" only
so My output should look like :
1342_seq2134_AGCTGTGGCAT
130_SEQ2289_TCGAATCGAGGAAC
I am trying grep -w
, grep -o
, grep -E
all these are not working for me . grep -o "13" filename
do suggest any command that works .
CodePudding user response:
With your shown samples, please try following awk
code. Simple explanation would be, if 1st field of your fasta file is NOT 13 then print that line. In awk
program making field separator as _
and checking if $1
(first field) is NOT 13 then print that line.
awk -F'_' '$1!="13"' Input_file
CodePudding user response:
If the file should not contain the number 13 anywhere in the string, you can match 13 without digits to the left and right and use -v
to invert the match.
The -P
is used for the lookarounds enabling a Perl-compatible regular expression.
grep -vP '(?<!\d)13(?!\d)' file
Or assert that the string does not contain 13 not being surrounded by digits:
grep -P '^(?!.*(?<!\d)13(?!\d))' file
Output
1342_seq2134_AGCTGTGGCAT
130_SEQ2289_TCGAATCGAGGAAC