my data
dfx=structure(list(V1 = c("(Description and Operation, 100-00 General Information) <a data-searchnum=G2107576 data-procuid=G1620638>Acceleration Control - Overview",
"(Description and Operation, 310-02 Acceleration Control) <a data-searchnum=G2232632 data-procuid=G2210282>Acceleration Control - System Operation and Component Description",
"(Description and Operation, 310-02 Acceleration Control) <a data-searchnum=G2232633 data-procuid=G2210283>Acceleration Control",
"(Diagnosis and Testing, 310-02 Acceleration Control) <a data-searchnum=G2118147 data-procuid=G2118148>Accelerator Pedal ")), class = "data.frame", row.names = c(NA,
-4L))
I require to extract the data-searchnum
and store it in a new df
G2107576
G2232632
G2232633
G2118147
G2110035
CodePudding user response:
Use str_extract
with a capture group ((...)
) after the data-searchnum=
substring
library(stringr)
str_extract(dfx$V1, 'data-searchnum=(\\S )', group = 1)
[1] "G2107576" "G2232632" "G2232633" "G2118147"
Or str_replace to capture the non-whitespace characters after the data-searchnum=
and replace with backreference (\\1
)
str_replace(dfx$V1, ".*data-searchnum=(\\S )\\s .*", "\\1")
[1] "G2107576" "G2232632" "G2232633" "G2118147"
If we are creating a new data
library(dplyr)
df2 <- dfx %>%
mutate(V1 = str_extract(V1, 'data-searchnum=(\\S )', group = 1))
> df2
V1
1 G2107576
2 G2232632
3 G2232633
4 G2118147
Or in base R
, use the same methodology as in str_replace
sub(".*data-searchnum=(\\S )\\s .*", "\\1", dfx$V1)
[1] "G2107576" "G2232632" "G2232633" "G2118147"