Home > Back-end >  Create a new column in tibble to summarise largest row of selected elements
Create a new column in tibble to summarise largest row of selected elements

Time:07-23

I've a dataframe football_supporters that looks like this:

Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats           32          99       156
#With 200 more rows

I would like to create a new column through the mutate() function and summarise what each club's favourite Drink is like down below:

Acode     Code     Clubname    Drink
 11       0111      Sharks     Coca_cola
 12       0112      Wolfs      Coca_cola
 13       0113      Dogs       Coca_cola
 14       0114      Cats       Sprite
#With 200 more rows

I've tried

colMax <- function(football_supporters) sapply(football_supporters, max, na.rm = TRUE)

and it doesn't work.

CodePudding user response:

You could gather the drinks columns to one column and slice the max value per group like this:

football_supporters <- read.table(text = "Mun_Code   Code   Municipality   Club       Coca_Cola    Fanta    Sprite
   3        11        NYC       Sharks        152          92       148
   3        12        NYC       Wolfs          93          72        54
   3        13        NYC       Dogs          143          67        38
   3        14        NYC       Cats          32           99       156", header = TRUE)

library(dplyr)
library(tidyr)
football_supporters %>%
  gather(Drink, cnt, Coca_Cola:Sprite) %>% 
  group_by(Club) %>% 
  slice(which.max(cnt)) %>%
  select(-cnt)
#> # A tibble: 4 × 5
#> # Groups:   Club [4]
#>   Mun_Code  Code Municipality Club   Drink    
#>      <int> <int> <chr>        <chr>  <chr>    
#> 1        3    14 NYC          Cats   Sprite   
#> 2        3    13 NYC          Dogs   Coca_Cola
#> 3        3    11 NYC          Sharks Coca_Cola
#> 4        3    12 NYC          Wolfs  Coca_Cola

Created on 2022-07-23 by the reprex package (v2.0.1)

CodePudding user response:

You can select the indices of the desired columns, and then use max.col, to get the highest column.

Note that you can use .keep = "unused" to remove all the columns that were used in the mutate call.

library(dplyr)

which_drink <- which(colnames(football_supporters) %in% c("Coca_Cola", "Fanta", "Sprite"))

football_supporters %>% 
  mutate(Drink = colnames(.[which_drink])[max.col(across(which_drink))], 
         .keep = "unused")

  Mun_Code Code Municipality   Club     Drink
1        3   11          NYC Sharks Coca_Cola
2        3   12          NYC  Wolfs Coca_Cola
3        3   13          NYC   Dogs Coca_Cola
4        3   14          NYC   Cats    Sprite

CodePudding user response:

Updated code thanks to @Darren Tsai:

We could do it this way:

  1. get long format

  2. arrange and finally select the first:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(c(Coca_Cola, Fanta, Sprite)) %>% 
  group_by(Mun_Code, Code, Club) %>% 
  arrange(-value, .by_group = TRUE) %>% 
  slice(1) %>% 
  select(-c(value, Municipality))
   Mun_Code  Code Club   name     
     <int> <int> <chr>  <chr>    
1        3    11 Sharks Coca_Cola
2        3    12 Wolfs  Coca_Cola
3        3    13 Dogs   Coca_Cola
4        3    14 Cats   Sprite 
  •  Tags:  
  • r
  • Related