Home > front end >  substring in R with stringr
substring in R with stringr

Time:11-13

I have a string that looks like this :

my_sting="AC=1;AN=249706;AF=4.00471e-06;rf_tp_probability=8.55653e-01;"

it is based on a column in my data :

enter image description here

  REF ALT    QUAL FILTER  INFO
1   C   A 3817.77   PASS  AN=2;AF=4.00471e06;rf_tp_probability=8.55653
2   C   G 3817.77   PASS  AN=3;AF=5;rf_tp_probability=8.55653

i wish to select only the part that start with AF= and ends with the number AF is equal to . for example here: AF=4.00471e-06

I tried this :

print(str_extract_all(my_sting, "AF=. ;"))
[[1]]
[1] "AF=4.00471e-06;rf_tp_probability=8.55653e-01;"

but it returned everything until the end. instead of returning AF=4.00471e-06 is there any way to fix this ? thank you

CodePudding user response:

You can write the pattern using a negated character class [^;] as:

library(stringr)
my_sting="AC=1;AN=249706;AF=4.00471e-06;rf_tp_probability=8.55653e-01;"
print(str_extract_all(my_sting, "AF=[^;] "))

Output

[[1]]
[1] "AF=4.00471e-06"

CodePudding user response:

Another option. Use "followed by ;" (i.e., (?=;))

my_sting="AC=1;AN=249706;AF=4.00471e-06;rf_tp_probability=8.55653e-01;"

str_extract(my_sting, "AF=.*?(?=;)")
#> [1] "AF=4.00471e-06"
  • Related