I've a dataframe football_supporters
that looks like this:
Mun_Code Code Municipality Club Coca_Cola Fanta Sprite
3 11 NYC Sharks 152 92 148
3 12 NYC Wolfs 93 72 54
3 13 NYC Dogs 143 67 38
3 14 NYC Cats 32 99 156
#With 200 more rows
I would like to create a new column through the mutate()
function and summarise what each club's favourite Drink
is like down below:
Acode Code Clubname Drink
11 0111 Sharks Coca_cola
12 0112 Wolfs Coca_cola
13 0113 Dogs Coca_cola
14 0114 Cats Sprite
#With 200 more rows
I've tried
colMax <- function(football_supporters) sapply(football_supporters, max, na.rm = TRUE)
and it doesn't work.
CodePudding user response:
You could gather
the drinks columns to one column and slice
the max
value per group like this:
football_supporters <- read.table(text = "Mun_Code Code Municipality Club Coca_Cola Fanta Sprite
3 11 NYC Sharks 152 92 148
3 12 NYC Wolfs 93 72 54
3 13 NYC Dogs 143 67 38
3 14 NYC Cats 32 99 156", header = TRUE)
library(dplyr)
library(tidyr)
football_supporters %>%
gather(Drink, cnt, Coca_Cola:Sprite) %>%
group_by(Club) %>%
slice(which.max(cnt)) %>%
select(-cnt)
#> # A tibble: 4 × 5
#> # Groups: Club [4]
#> Mun_Code Code Municipality Club Drink
#> <int> <int> <chr> <chr> <chr>
#> 1 3 14 NYC Cats Sprite
#> 2 3 13 NYC Dogs Coca_Cola
#> 3 3 11 NYC Sharks Coca_Cola
#> 4 3 12 NYC Wolfs Coca_Cola
Created on 2022-07-23 by the reprex package (v2.0.1)
CodePudding user response:
You can select the indices of the desired columns, and then use max.col
, to get the highest column.
Note that you can use .keep = "unused"
to remove all the columns that were used in the mutate call.
library(dplyr)
which_drink <- which(colnames(football_supporters) %in% c("Coca_Cola", "Fanta", "Sprite"))
football_supporters %>%
mutate(Drink = colnames(.[which_drink])[max.col(across(which_drink))],
.keep = "unused")
Mun_Code Code Municipality Club Drink
1 3 11 NYC Sharks Coca_Cola
2 3 12 NYC Wolfs Coca_Cola
3 3 13 NYC Dogs Coca_Cola
4 3 14 NYC Cats Sprite
CodePudding user response:
Updated code thanks to @Darren Tsai:
We could do it this way:
get long format
arrange and finally select the first:
library(dplyr)
library(tidyr)
df %>%
pivot_longer(c(Coca_Cola, Fanta, Sprite)) %>%
group_by(Mun_Code, Code, Club) %>%
arrange(-value, .by_group = TRUE) %>%
slice(1) %>%
select(-c(value, Municipality))
Mun_Code Code Club name
<int> <int> <chr> <chr>
1 3 11 Sharks Coca_Cola
2 3 12 Wolfs Coca_Cola
3 3 13 Dogs Coca_Cola
4 3 14 Cats Sprite