I have the data about sales by years and by-products, let's say like this:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
I want the code which identifies the TOP 2 products by years and summarises all the rest products as category "other".
CodePudding user response:
We could arrange
by 'Year' and 'Sale' in desc
ending order and then change the values of 'Model' based on the row_number after grouping by 'Year'
library(dplyr)
df %>%
arrange(Year, desc(Sale)) %>%
group_by(Year) %>%
mutate(Model = case_when(row_number() > 2~ 'other', TRUE ~ Model)) %>%
ungroup
Or another option is to use slice_max
(with_ties = TRUE
by default)
df %>%
group_by(Year) %>%
mutate(Model =case_when(Model %in% {cur_data() %>%
slice_max(n = 2, order_by = Sale) %>%
pull(Model)} ~ Model, TRUE ~ "other" )
)
CodePudding user response:
Similar to akrun's solution: Slighlty other strategy:
library(dplyr)
df %>%
type.convert(as.is = TRUE) %>%
group_by(Year) %>%
arrange(desc(Sale), .by_group = TRUE) %>%
mutate(Model = ifelse(Model == first(Model, 2), Model, "Other"))
# Groups: Year [3]
Year Model Sale
<int> <chr> <int>
1 2010 b 45
2 2010 d 33
3 2010 Other 30
4 2010 Other 24
5 2010 Other 23
6 2011 b 56
7 2011 e 56
8 2011 Other 45
9 2011 Other 19
10 2011 Other 11
11 2012 c 89
12 2012 a 33
13 2012 Other 33
14 2012 Other 32
15 2012 Other 12
CodePudding user response:
You can use fct_lump_n()
from forcats
to collapse levels, but first you need to uncount
your data.
library(dplyr)
library(tidyr)
library(forcats)
df |>
mutate(Sale = as.integer(Sale)) |>
uncount(Sale) |>
group_by(Year) |>
mutate(Model = fct_lump_n(Model, n = 2)) |>
count(Model)