Home > Back-end >  Ifelse for Multiple Columns in DataFrame
Ifelse for Multiple Columns in DataFrame

Time:10-13

I have a dataset like the following:

ID Winter Spring Summer Fall
1 high NA high low
2 low high NA low
3 low NA NA low
4 low high NA low

I would like to add a calculated column so that if any of the winter, spring, summer, and fall columns contains "high," then "1" is added to that row as shown below. Otherwise it will contain 0.

ID Winter Spring Summer Fall calculated_column
1 high NA high low 1
2 low high NA low 1
3 low NA NA low 0
4 low high NA low 1

So far I have something like this, I know it's incorrect. I'm not sure how to specify multiple columns rather than just one:

df$calculated_column <- ifelse(c(2:5)=="High",1,0)

CodePudding user response:

We may use if_any

library(dplyr)
df1 <- df1 %>%
     mutate(calculated_column =  (if_any(-ID, ~ . %in% 'high')))

-output

df1
 ID Winter Spring Summer Fall calculated_column
1  1   high   <NA>   high  low                 1
2  2    low   high   <NA>  low                 1
3  3    low   <NA>   <NA>  low                 0
4  4    low   high   <NA>  low                 1

Or if we want to use base R, create the logical condition with rowSums on a logical matrix

df1$calculated_column <-   (rowSums(df1[-1] == "high", na.rm = TRUE) > 0)

data

df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA, 
NA, NA), Fall = c("low", "low", "low", "low")), 
class = "data.frame", row.names = c(NA, 
-4L))

CodePudding user response:

You could also do:

df1$calculated_column =  grepl('high', do.call(paste, df1))
df1
  ID Winter Spring Summer Fall calculated_column
1  1   high   <NA>   high  low                 1
2  2    low   high   <NA>  low                 1
3  3    low   <NA>   <NA>  low                 0
4  4    low   high   <NA>  low                 1

CodePudding user response:

Here is a base R solution:

calculated_column = (apply(df1,1,function(x) sum(grepl("high",x)))>0)*1

cbind(df1, calculated_column) 
  ID Winter Spring Summer Fall calculated_column
1  1   high   <NA>   high  low                 1
2  2    low   high   <NA>  low                 1
3  3    low   <NA>   <NA>  low                 0
4  4    low   high   <NA>  low                 1
  • Related