Home > OS >  Reshape a dataframe in order to slice min / max for each group
Reshape a dataframe in order to slice min / max for each group

Time:01-03

I have a dataframe looking like:

library(tidyverse)

df <- tibble::tribble(
        ~id,  ~name,       ~type, ~value,
          1, "aaaa",       "car",   2.67,
          1, "aaaa",      "moto",      2,
          2, "bbbb",      "bike",   3.67,
          4, "cccc",     "skate",   2.75,
          6, "dddd", "snowboard",   4.33,
          7, "eeee",   "scooter",    2.8,
          7, "eeee",       "car",   2.33,
          7, "eeee",       "ski",    2.5,
          8, "ffff",      "boat",      3,
          8, "ffff",   "scooter",    2.8
        )

I would like to get for each id, name:

  1. the best type based on the value aka max(value)
  2. the worst type based on the value aka min(value)
  3. whenever there is only one row for each combination of id/name, I would like the value to be placed into the ~best column, while - should be appearing in the ~worst

Resulting in:

df_final <- tibble::tribble(
              ~id,  ~name,       ~best,    ~worst,
                1, "aaaa",       "car",    "moto",
                2, "bbbb",      "bike",       "-",
                4, "cccc",     "skate",       "-",
                6, "dddd", "snowboard",       "-",
                7, "eeee",   "scooter",     "car",
                8, "ffff",      "boat", "scooter"
              )

I tried to use slice_max() but I can't figure out how to place the type in the ~best / ~worst columns!

CodePudding user response:

One alternative is to arrange by value from highest to lowest value, and then using summarise take the first and last values as "best" and "worst."

library(tidyverse)

df %>%
  group_by(id, name) %>%
  arrange(id, name, desc(value)) %>%
  summarise(best = first(type), 
            worst = ifelse(n() == 1, "-", last(type)))

Output

     id name  best      worst  
  <dbl> <chr> <chr>     <chr>  
1     1 aaaa  car       moto   
2     2 bbbb  bike      -      
3     4 cccc  skate     -      
4     6 dddd  snowboard -      
5     7 eeee  scooter   car    
6     8 ffff  boat      scooter

CodePudding user response:

You can do:

df %>%
  group_by(id, name) %>%
  summarize(best = type[value == max(value)],
            worst = type[value == min(value)],
            n = n()) %>%
  mutate(worst = if_else(n == 1, '-', worst)) %>%
  select(-n)

which gives:

# A tibble: 6 x 4
# Groups:   id [6]
     id name  best      worst  
  <dbl> <chr> <chr>     <chr>  
1     1 aaaa  car       moto   
2     2 bbbb  bike      -      
3     4 cccc  skate     -      
4     6 dddd  snowboard -      
5     7 eeee  scooter   car    
6     8 ffff  boat      scooter

Alternative solution as suggested in the comment:

df %>%
  group_by(id, name) %>%
  summarize(best = type[value == max(value)],
            worst = if_else(n() == 1, "-", type[value == min(value)]),
            .groups = "drop")

which gives:

# A tibble: 6 x 4
     id name  best      worst  
  <dbl> <chr> <chr>     <chr>  
1     1 aaaa  car       moto   
2     2 bbbb  bike      -      
3     4 cccc  skate     -      
4     6 dddd  snowboard -      
5     7 eeee  scooter   car    
6     8 ffff  boat      scooter
  • Related