Intrincate variable generation with conditionals against multiple factor variables in R-CodePudding

I'm triying to generate a new variable using multiple conditionals that evaluate against factor variables.

So, let's say I got this factor variables data.frame

x<-c("1", "2", "1","NA", "1", "2", "NA", "1", "2", "2", "NA" )

y<-c("1","NA", "2", "1", "1", "NA", "2", "1", "2", "1", "1" )

z<-c("1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3")

w<- c("01", "02", "03", "04","05", "06", "07", "01", "02", "03", "04")


df<-data.frame(x,y,z,w)

df$x<-as.factor(df$x)
df$y<-as.factor(df$y)
df$z<-as.factor(df$z)
df$w<-as.factor(df$w)

str(df)

So I need to get a new v colum on my dataframe which takes values between 1, 0 or NA with the following conditionals:

Takes value 1 if: x = "1", y = "1", z = "1" or "2", w = "01" to "06"

Takes value 0 if it doesn't meet at least one of the conditionals.

Takes value NA if any of x, y, z, or w is NA.

Had tried using a pipe %>% along mutate and case_when but have been unable to make it work.

So my desired result would be a new column v in df which would look like this:

[1] 1 NA 0 NA 1 NA NA 0 0 0 NA

CodePudding user response：

Here I also use mutate with case_when. Since the NA in your dataset is of character "NA" (literal string of "NA"), we cannot use function like is.na() to idenify it. Would recommend to change it to "real" NA (by removing double quotes in your input).

As I've pointed out in the comment, I'm not sure why the eighth entry is "1" when the corresponding z is not "1" or "2".

library(dplyr)

df %>% mutate(v = case_when(x == "1" & y == "1" & z %in% c("1", "2") & w %in% paste0(0, seq(1:6)) ~ "1",
                            x == "NA" | y == "NA" | z  == "NA" | w  == "NA" ~ NA_character_, 
                            T ~ "0"))

    x  y z  w    v
1   1  1 1 01    1
2   2 NA 2 02 <NA>
3   1  2 3 03    0
4  NA  1 4 04 <NA>
5   1  1 1 05    1
6   2 NA 2 06 <NA>
7  NA  2 3 07 <NA>
8   1  1 4 01    0
9   2  2 1 02    0
10  2  1 2 03    0
11 NA  1 3 04 <NA>