Home > OS >  Getting a specific string pattern
Getting a specific string pattern

Time:12-03

I have a database with a string column, this database looks like this:

structure(list(variables = c("data$Ageee[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]",
"data$var[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]",
"data$variable_test[data$Beneficiary == 1] and data$Age[data$Beneficiary == 0]"
), values = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L))

However, I would like to get a new column considering the text after the first $ and before the first [, so I get:

structure(list(variables = c("Ageee", "var", "variable_test"
), values = c(0, 0, 0)), class = "data.frame", row.names = c(NA,
-3L))

I appreciate any help.

CodePudding user response:

We may use sub to capture the word ((\\w ) after the $ - $ is a metacharacter in regex that denotes the end of the string, so it is escaped (\\)

df1$variables <- sub("\\w \\$(\\w ).*", "\\1", df1$variables)

-output

> df1
      variables values
1         Ageee      0
2           var      0
3 variable_test      0

CodePudding user response:

We can use stringr with str_extract and str_remove, with a positive lookbehind to ascertain the desired pattern comes right after the first $.

library(dplyr)
library(stringr)

df %>% mutate(variables = str_extract(variables, "(<?\\$)\\w ")%>%
                  str_remove('\\$'))

      variables values
1         Ageee      0
2           var      0
3 variable_test      0
  • Related