Home > Software engineering >  Reorder factor level based on group averages
Reorder factor level based on group averages

Time:10-14

I often make a factor variable that I want to retain the order of the variable it comes from. I feel like I should be able to do this by taking the average within each group of the new categorical variable, then using that as the ordering variable in fct_reorder, but it doesn't seem to work. Here is a simple example:

test_data <- mtcars %>% 
  mutate(mpg_cat=case_when(mpg>20 ~ "More than 20",
                           mpg<=20 & mpg>=15 ~ "15-20",
                           mpg<15 ~ "Less than 15")) %>% 
  group_by(mpg_cat) %>% 
  mutate(avg_mpg=mean(mpg),
         mpg_cat=fct_reorder(mpg_cat,avg_mpg))
levels(test_data$mpg_cat) #Want the order to be less than 15, 15-20, More than 20

CodePudding user response:

You can amend your pipe to order it by the mean variable you created and set the ordered factor levels based on that:

library(dplyr)

test_data <- mtcars |>
    mutate(
        mpg_cat = case_when(
            mpg > 20 ~ "More than 20",
            mpg <= 20 & mpg >= 15 ~ "15-20",
            mpg < 15 ~ "Less than 15"
        )
    ) |>
    group_by(mpg_cat) |>
    mutate(avg_mpg = mean(mpg)) |>
    ungroup() %>%
    arrange(avg_mpg) |>
    mutate(
        mpg_cat = factor(
            mpg_cat,
            levels = unique(mpg_cat),
            ordered = TRUE
        )
    )

head(test_data$mpg_cat)
# [1] Less than 15 Less than 15 Less than 15 Less than 15 Less than 15 15-20       
# Levels: Less than 15 < 15-20 < More than 20

Alternatively, if you created the mean purely for the order, you can skip all the grouping, creating a new variable and ungrouping by ordering by mpg at the beginning, which ensures that unique(mpg_cat) is the right order.

mtcars |>
    arrange(mpg)  |>
    mutate(
        mpg_cat = case_when(
            mpg > 20 ~ "More than 20",
            mpg <= 20 & mpg >= 15 ~ "15-20",
            mpg < 15 ~ "Less than 15"
        )
    ) |>
     mutate(
        mpg_cat = factor(
            mpg_cat,
            levels = unique(mpg_cat),
            ordered = TRUE
        )
    ) 

CodePudding user response:

Just ungroup after mutate and then use fct_reorder. Using your code:

test_data <- mtcars %>% 
  mutate(mpg_cat=case_when(mpg>20 ~ "More than 20",
                           mpg<=20 & mpg>=15 ~ "15-20",
                           mpg<15 ~ "Less than 15")) %>% 
  group_by(mpg_cat) %>% 
  mutate(avg_mpg=mean(mpg)) %>% 
  ungroup() %>% 
  mutate(mpg_cat=fct_reorder(mpg_cat,avg_mpg))
  
levels(test_data$mpg_cat) #Want the order to be less than 15, 15-20, More than 20
[1] "Less than 15" "15-20"        "More than 20"

CodePudding user response:

There is a .fun argument in fct_recode which by default is median. So, we can directly modify the levels without grouping

library(dplyr)
library(forcats)
out <- mtcars %>% 
  mutate(mpg_cat=fct_reorder(case_when(mpg>20 ~ "More than 20",
                           mpg<=20 & mpg>=15 ~ "15-20",
                           mpg<15 ~ "Less than 15"), mpg, .fun = "mean"))

-output

> levels(out$mpg_cat)
[1] "Less than 15" "15-20"        "More than 20"
  • Related