I'm struggling with regex
I'd like to do something like this:
Dummy data:
strings <- c('asr#2ldf;wwABC=0.0732sss;63d!;',
'ggggABC=0.0001#$Gsxxaaafo',
'zzzdd$rfr67333dsass',
'ABC=0.9882ssssFGJJTRRREWE!!ddww',
'ABC=0.0921',
'sshdasljhg[aois^*3342222346677777ABC=0.752164sssdds33')
df <- data.frame(strings)
And the desired result is:
strings result
1 asr#2ldf;wwABC=0.0732sss;63d!; ABC=0.0732
2 ggggABC=0.0001#$Gsxxaaafo ABC=0.0001
3 zzzdd$rfr67333dsass <NA>
4 ABC=0.9882ssssFGJJTRRREWE!!ddww ABC=0.9882
5 ABC=0.0921 ABC=0.0921
6 sshdasljhg[aois^*3342222346677777ABC=0.752164sssdds33 ABC=0.7521
I'd like to extract ABC
with =
and the number rounded to four decimal places. If there's no ABC
then return NA
. Strings may have different length and they can have every symbol, nevertheless ABC
occurs only one per string. Moreover ABC
is located in different position regarding each string.
How can I do it?
CodePudding user response:
A possible solution:
library(tidyverse)
df %>%
mutate(result = str_extract(strings, "ABC\\=\\d \\.\\d{4}"))
#> strings result
#> 1 asr#2ldf;wwABC=0.0732sss;63d!; ABC=0.0732
#> 2 ggggABC=0.0001#$Gsxxaaafo ABC=0.0001
#> 3 zzzdd$rfr67333dsass <NA>
#> 4 ABC=0.9882ssssFGJJTRRREWE!!ddww ABC=0.9882
#> 5 ABC=0.0921 ABC=0.0921
#> 6 sshdasljhg[aois^*3342222346677777ABC=0.752164sssdds33 ABC=0.7521
CodePudding user response:
One solution using dplyr
and stringr
df %>% mutate(a = str_extract(strings, "ABC=[0-9].[0-9]{4}"))
strings a
1 asr#2ldf;wwABC=0.0732sss;63d!; ABC=0.0732
2 ggggABC=0.0001#$Gsxxaaafo ABC=0.0001
3 zzzdd$rfr67333dsass <NA>
4 ABC=0.9882ssssFGJJTRRREWE!!ddww ABC=0.9882
5 ABC=0.0921 ABC=0.0921
6 sshdasljhg[aois^*3342222346677777ABC=0.752164sssdds33 ABC=0.7521
CodePudding user response:
The regex could be like this in case the number are not always decimals:
"ABC=\\d \.?\\d{0,4}"