I have a dataset like the following:
ID | Winter | Spring | Summer | Fall |
---|---|---|---|---|
1 | high | NA | high | low |
2 | low | high | NA | low |
3 | low | NA | NA | low |
4 | low | high | NA | low |
I would like to add a calculated column so that if any of the winter, spring, summer, and fall columns contains "high," then "1" is added to that row as shown below. Otherwise it will contain 0.
ID | Winter | Spring | Summer | Fall | calculated_column |
---|---|---|---|---|---|
1 | high | NA | high | low | 1 |
2 | low | high | NA | low | 1 |
3 | low | NA | NA | low | 0 |
4 | low | high | NA | low | 1 |
So far I have something like this, I know it's incorrect. I'm not sure how to specify multiple columns rather than just one:
df$calculated_column <- ifelse(c(2:5)=="High",1,0)
CodePudding user response:
We may use if_any
library(dplyr)
df1 <- df1 %>%
mutate(calculated_column = (if_any(-ID, ~ . %in% 'high')))
-output
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
Or if we want to use base R
, create the logical condition with rowSums
on a logical matrix
df1$calculated_column <- (rowSums(df1[-1] == "high", na.rm = TRUE) > 0)
data
df1 <- structure(list(ID = 1:4, Winter = c("high", "low", "low", "low"
), Spring = c(NA, "high", NA, "high"), Summer = c("high", NA,
NA, NA), Fall = c("low", "low", "low", "low")),
class = "data.frame", row.names = c(NA,
-4L))
CodePudding user response:
You could also do:
df1$calculated_column = grepl('high', do.call(paste, df1))
df1
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1
CodePudding user response:
Here is a base R
solution:
calculated_column = (apply(df1,1,function(x) sum(grepl("high",x)))>0)*1
cbind(df1, calculated_column)
ID Winter Spring Summer Fall calculated_column
1 1 high <NA> high low 1
2 2 low high <NA> low 1
3 3 low <NA> <NA> low 0
4 4 low high <NA> low 1