I have a database with a string column, this database looks like this:
structure(list(variables = c("data$Ageee[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]",
"data$var[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]",
"data$variable_test[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]"
), values = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L))
However, I would like to get a new column considering the text after the first $
and before the first [
, so I get:
structure(list(variables = c("Ageee", "var", "variable_test"
), values = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L))
I appreciate any help.
CodePudding user response:
We may use sub
to capture the word ((\\w
) after the $
- $
is a metacharacter in regex that denotes the end of the string, so it is escaped (\\
)
df1$variables <- sub("\\w \\$(\\w ).*", "\\1", df1$variables)
-output
> df1
variables values
1 Ageee 0
2 var 0
3 variable_test 0
CodePudding user response:
We can use stringr
with str_extract
and str_remove
, with a positive lookbehind to ascertain the desired pattern comes right after the first $
.
library(dplyr)
library(stringr)
df %>% mutate(variables = str_extract(variables, "(<?\\$)\\w ")%>%
str_remove('\\$'))
variables values
1 Ageee 0
2 var 0
3 variable_test 0