Home > OS >  Re-ordering a factor (in a table) with hierarchical groups in R
Re-ordering a factor (in a table) with hierarchical groups in R

Time:09-12

Suppose the following table with two factor variabels and one numerical variable:

df <- tibble(
 x = as_factor(c("a", "a", "a", "b", "b", "b")),
 y = as_factor(1:6),
 val = c(10, 3, 8, 2, 6, 1)
)

> df
# A tibble: 6 x 3
  x     y       val
  <fct> <fct> <dbl>
1 a     1        10
2 a     2         3
3 a     3         8
4 b     4         2
5 b     5         6
6 b     6         1

I would like to re-order y such that the sum of val, when grouped by x, takes precedent, but y is still ordered by val. To illustrate the goal:

# A tibble: 6 x 4
# Groups:   x [2]
  x     y       val   sum
  <fct> <fct> <dbl> <dbl>
1 a     1        10    21 # all y for which x=="a" come first, because
2 a     3         8    21 # the sum of val for x=="a" is greater than
3 a     2         3    21 # for x=="b"
4 b     5         6     9 # within each group, y is ordered by val
5 b     4         2     9
6 b     6         1     9

But how do I get there? Within tidyverse, I tried to solve it with forcats::fct_reorder(), thinking that grouping might help (df |> group_by(x) |> mutate(y = fct_reorder(y, val))), but it doesn't.

Can fct_reorder() do that at all? What other approaches could work?


Edit: I have found a solution, but it feels rather hacky:

df |> 
 group_by(x) |> 
 mutate(sum = sum(val)) |> 
 arrange(desc(sum), desc(val)) |> ungroup() |> 
 tibble::rowid_to_column() |> 
 mutate(across(c(x, y), \(x) fct_reorder(x, rowid)))

CodePudding user response:

Perhaps, we need to arrange

library(dplyr)
library(forcats)
df %>%
  arrange(desc(ave(val, x, FUN = sum)), desc(val)) %>% 
  mutate(across(where(is.factor), fct_inorder))

-output

# A tibble: 6 × 3
  x     y       val
  <fct> <fct> <dbl>
1 a     1        10
2 a     3         8
3 a     2         3
4 b     5         6
5 b     4         2
6 b     6         1

Or use fct_reorder/reorder in arrange

df %>% 
  arrange(desc(fct_reorder(x, val, .fun = sum)), desc(val)) %>%  
  mutate(across(where(is.factor), fct_inorder)

CodePudding user response:

Probably we can use the following data.table option along with fct_inorder

setorder(
  setDT(df)[
    ,
    sum := sum(val), x
  ],
  -sum, -val
)[
  ,
  lapply(
    .SD,
    function(x) ifelse(is.factor(x), fct_inorder, c)(x)
  )
] 

and you will obtain

   x y val sum
1: a 1  10  21
2: a 3   8  21
3: a 2   3  21
4: b 5   6   9
5: b 4   2   9
6: b 6   1   9
  • Related