I need to create a cathegorical variable (new_var) based on some conditions. The variable containing those conditions (var) is in text format such as:
var | new_var |
---|---|
L: 06:00-22:00 (A) | A |
L: 00:00-07:59 (D), 07:59-23:59 (A) | MIXED |
L-V: 08:00-21:00 (A), 07:59-23:59 (A) | A |
S: 08:00-19:00 (D) | D |
So, the conditions for creating the new variable are between brakets. Can be A, D, or MIXED (A & D).
I tried the following code:
var = as.character(c('L: 06:00-22:00 (A)', 'L: 00:00-07:59 (D), 07:59-23:59 (A)', 'L-V: 08:00-21:00 (A), 07:59-23:59 (A)', 'S: 08:00-19:00 (D)'))
df<- as.data.frame(var)
df<- df%>%
mutate(new_var = case_when(
grepl("(D).*(A)", df$var) ~ "MIXED",
grepl("(A)", df$var) ~ "A",
grepl("(D)", df$var) ~ "D",
T ~ "N/A"))
But creates the newvar imprecisely, with errors.
CodePudding user response:
Does it have to be using tidyverse piping?
If not, here are some steps you could try:
x <- c('L: 06:00-22:00 (A)', 'L: 00:00-07:59 (D), 07:59-23:59 (A)', 'L-V: 08:00-21:00 (A), 07:59-23:59 (A)', 'S: 08:00-19:00 (D)')
m <- gregexpr('(?<=\\()[AD](?=\\))', x, perl = TRUE)
regmatches(x, m)
[[1]]
[1] "A"
[[2]]
[1] "D" "A"
[[3]]
[1] "A" "A"
[[4]]
[1] "D"
new_var <- lapply(regmatches(x, m), unique)
new_var[sapply(new_var, length) > 1] <- 'MIXED'
unlist(new_var)