I often make a factor variable that I want to retain the order of the variable it comes from. I feel like I should be able to do this by taking the average within each group of the new categorical variable, then using that as the ordering variable in fct_reorder, but it doesn't seem to work. Here is a simple example:
test_data <- mtcars %>%
mutate(mpg_cat=case_when(mpg>20 ~ "More than 20",
mpg<=20 & mpg>=15 ~ "15-20",
mpg<15 ~ "Less than 15")) %>%
group_by(mpg_cat) %>%
mutate(avg_mpg=mean(mpg),
mpg_cat=fct_reorder(mpg_cat,avg_mpg))
levels(test_data$mpg_cat) #Want the order to be less than 15, 15-20, More than 20
CodePudding user response:
You can amend your pipe to order it by the mean variable you created and set the ordered factor levels based on that:
library(dplyr)
test_data <- mtcars |>
mutate(
mpg_cat = case_when(
mpg > 20 ~ "More than 20",
mpg <= 20 & mpg >= 15 ~ "15-20",
mpg < 15 ~ "Less than 15"
)
) |>
group_by(mpg_cat) |>
mutate(avg_mpg = mean(mpg)) |>
ungroup() %>%
arrange(avg_mpg) |>
mutate(
mpg_cat = factor(
mpg_cat,
levels = unique(mpg_cat),
ordered = TRUE
)
)
head(test_data$mpg_cat)
# [1] Less than 15 Less than 15 Less than 15 Less than 15 Less than 15 15-20
# Levels: Less than 15 < 15-20 < More than 20
Alternatively, if you created the mean purely for the order, you can skip all the grouping, creating a new variable and ungrouping by ordering by mpg
at the beginning, which ensures that unique(mpg_cat)
is the right order.
mtcars |>
arrange(mpg) |>
mutate(
mpg_cat = case_when(
mpg > 20 ~ "More than 20",
mpg <= 20 & mpg >= 15 ~ "15-20",
mpg < 15 ~ "Less than 15"
)
) |>
mutate(
mpg_cat = factor(
mpg_cat,
levels = unique(mpg_cat),
ordered = TRUE
)
)
CodePudding user response:
Just ungroup
after mutate
and then use fct_reorder
. Using your code:
test_data <- mtcars %>%
mutate(mpg_cat=case_when(mpg>20 ~ "More than 20",
mpg<=20 & mpg>=15 ~ "15-20",
mpg<15 ~ "Less than 15")) %>%
group_by(mpg_cat) %>%
mutate(avg_mpg=mean(mpg)) %>%
ungroup() %>%
mutate(mpg_cat=fct_reorder(mpg_cat,avg_mpg))
levels(test_data$mpg_cat) #Want the order to be less than 15, 15-20, More than 20
[1] "Less than 15" "15-20" "More than 20"
CodePudding user response:
There is a .fun
argument in fct_recode
which by default is median
. So, we can directly modify the levels without grouping
library(dplyr)
library(forcats)
out <- mtcars %>%
mutate(mpg_cat=fct_reorder(case_when(mpg>20 ~ "More than 20",
mpg<=20 & mpg>=15 ~ "15-20",
mpg<15 ~ "Less than 15"), mpg, .fun = "mean"))
-output
> levels(out$mpg_cat)
[1] "Less than 15" "15-20" "More than 20"