I have the data about sales by years and by-products, let's say like this:
Year <- c(2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012)
Model <- c("a","b","c","d","e","a","b","c","d","e","a","b","c","d","e")
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12")
df <- data.frame(Year, Model, Sale)
product by years:
a= 30 11 33 = 74 b= 45 56 32 = 133 c= 23 19 89 = 131 d= 33 45 33 = 111 e= 12 56 24 = 92
Ranking by according to total sales within these 3 years:
1 2 3 4 5
b c d e a
I want the code which identifies the TOP 2 products (according to total sales within these 3 years) by years and summarises all the rest products as category "other". So the output should be like this:
year Model Sale 2010 b 45 2010 c 23 2010 other 30 33 24=92 2011 b 56 2011 c 19 2011 other 11 45 56=112 2012 b 32 2012 c 89 2012 other 33 33 12= 78
CodePudding user response:
A tidyverse solution. Your Sale
data appear to be stored as character, which means we'll have to use as.numeric
before summing them.
library(tidyverse)
df %>%
group_by(Model) %>%
mutate(
Sale = as.numeric(Sale),
total_sale = sum(Sale)
) %>%
ungroup %>%
mutate(
model_condensed = ifelse(total_sale %in% rev(sort(unique(total_sale)))[1:2], Model, 'other')
) %>%
group_by(Year, model_condensed) %>%
summarize(Sale = sum(Sale))
Year model_condensed Sale
<dbl> <chr> <dbl>
1 2010 b 45
2 2010 c 23
3 2010 other 87
4 2011 b 56
5 2011 c 19
6 2011 other 112
7 2012 b 32
8 2012 c 89
9 2012 other 78
The above solution creates the "other" category by matching on the values in Sale
. This could cause problems if those values have decimal places (see this question). Instead, we could use a two-step process to identify the top two Models by name, and use this to create the groupings for the total data:
totals <- df %>%
group_by(Model) %>%
summarize(total_sale = sum(as.numeric(Sale))) %>%
arrange(desc(total_sale)) %>%
slice_head(n = 2)
df %>%
group_by(Year, model_condensed = ifelse(Model %in% totals$Model, Model, 'other')) %>%
summarize(Sale = sum(as.numeric(Sale)))