Simple question, what don't I understand about how case_when works. In the example below, I expected 4 levels in season but I get only two.
Thanks
data <- tibble(day = 1:366) %>%
mutate(
season = case_when(
day <= 60 | day > 335 ~ "winter",
day > 60 | day <= 151 ~ "spring",
day > 151 | day <= 242 ~ "summer",
day > 242 | day <= 335 ~ "autumn"
)
)
CodePudding user response:
The expressions 2 to 4 would be &
instead of |
. Reason is that |
will overwrite some of the values from the first condition because of overlap
library(dplyr)
data <- tibble(day = 1:366) %>%
mutate(
season = case_when(
day <= 60 | day > 335 ~ "winter",
day > 60 & day <= 151 ~ "spring",
day > 151 & day <= 242 ~ "summer",
day > 242 & day <= 335 ~ "autumn"
)
)
-checking
> n_distinct(data$season)
[1] 4
CodePudding user response:
actually you can reduce this case_when() statement a bit, because case_when breaks as soon as one condition is met. So if the value is lower/equal to 60 or larger then 335, the next condition is suficiently definied with lower than 151:
library(dplyr)
data <- tibble(day = 1:366) %>%
mutate(
season = case_when(
day <= 60 | day > 335 ~ "winter",
day <= 151 ~ "spring",
day <= 242 ~ "summer",
day <= 335 ~ "autumn"
)
)
also you can make use of the TRUE case as it is used when all prior conditions are not met:
data <- tibble(day = 1:366) %>%
mutate(
season = case_when(
day <= 60 ~ "winter",
day <= 151 ~ "spring",
day <= 242 ~ "summer",
day <= 335 ~ "autumn",
TRUE ~ "winter"
)
)
CodePudding user response:
Stop using case_when
and use cut
instead.
tibble(day = 1:366) |>
mutate(
season = cut(day,
c(0, 60, 151, 242, 335, 366),
c("winter", "spring", "summer", "autumn",
"winter")
)
)