Home > front end >  Calculate Top N products by sales with in each year
Calculate Top N products by sales with in each year

Time:04-20

I have the data about sales by years and by-products, let's say like this:

Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)

I want the code which identifies the TOP 2 products by years and summarises all the rest products as category "other".

CodePudding user response:

We could arrange by 'Year' and 'Sale' in descending order and then change the values of 'Model' based on the row_number after grouping by 'Year'

library(dplyr)
df %>% 
 arrange(Year, desc(Sale)) %>% 
 group_by(Year) %>%
 mutate(Model = case_when(row_number() > 2~ 'other', TRUE ~ Model)) %>%
 ungroup

Or another option is to use slice_max (with_ties = TRUE by default)

df %>%
  group_by(Year) %>% 
  mutate(Model =case_when(Model %in% {cur_data() %>% 
     slice_max(n = 2, order_by = Sale) %>%
     pull(Model)} ~ Model, TRUE ~ "other" )
)

CodePudding user response:

Similar to akrun's solution: Slighlty other strategy:

library(dplyr)

df %>% 
  type.convert(as.is = TRUE) %>% 
  group_by(Year) %>% 
  arrange(desc(Sale), .by_group = TRUE) %>% 
  mutate(Model = ifelse(Model == first(Model, 2), Model, "Other"))
 
# Groups:   Year [3]
    Year Model  Sale
   <int> <chr> <int>
 1  2010 b        45
 2  2010 d        33
 3  2010 Other    30
 4  2010 Other    24
 5  2010 Other    23
 6  2011 b        56
 7  2011 e        56
 8  2011 Other    45
 9  2011 Other    19
10  2011 Other    11
11  2012 c        89
12  2012 a        33
13  2012 Other    33
14  2012 Other    32
15  2012 Other    12

CodePudding user response:

You can use fct_lump_n() from forcats to collapse levels, but first you need to uncount your data.

library(dplyr)
library(tidyr)
library(forcats)
df |> 
  mutate(Sale = as.integer(Sale)) |>
  uncount(Sale) |> 
  group_by(Year) |> 
  mutate(Model = fct_lump_n(Model, n = 2)) |>
  count(Model)
  • Related