So this seems like it should be simple but I haven't been able to sort it out. Basically I have a database where the item is on multiple rows because it is sold it multiple states, and I want to rank the items sales increase by the total sales increase of the item (aggregated across states).
Repro data:
df = data.frame(Item = c("Beer","Beer","Beer","Beer","Wine","Wine","Brandy","Brandy","Brandy"),
State = c("CA","WA","OR","CO","CA","OR","CA","OR","CO"),
SalesThisWeek = c(400, 300, 1000, 1200, 800, 200, 600, 100, 150),
SalesLastWeek = c(150, 300, 625, 1150, 700, 50, 200, 80, 125))
df<-df %>% group_by(Item) %>%
mutate(TotSalesChange=sum(SalesThisWeek-SalesLastWeek)) %>%
mutate(ChangeRank=order(TotSalesChange, decreasing = TRUE))
You end up with:
Item State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
Beer CA 400 150 675 1
Beer WA 300 300 675 2
Beer OR 1000 625 675 3
Beer CO 1200 1150 675 4
Wine CA 800 700 250 1
Wine OR 200 50 250 2
Brandy CA 600 200 445 1
Brandy OR 100 80 445 2
Brandy CO 150 125 445 3
But what I want would be to end up with the rankings applied to the Item rather than the item within the state... so something like:
Item State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
Beer CA 400 150 675 1
Beer WA 300 300 675 1
Beer OR 1000 625 675 1
Beer CO 1200 1150 675 1
Wine CA 800 700 250 3
Wine OR 200 50 250 3
Brandy CA 600 200 445 2
Brandy OR 100 80 445 2
Brandy CO 150 125 445 2
because this is actually ranking the total aggregated sales for the item. I'm not sure how to rank the TotSalesChange to allow for duplication of ranking number, I guess. Any help is much appreciated!
CodePudding user response:
This is a kludgy answer but it'll work. We just need to summarise by alcohol type and then join the summary back into the df.
df <- data.frame(
Item = c("Beer","Beer","Beer","Beer","Wine","Wine","Brandy","Brandy","Brandy"),
State = c("CA","WA","OR","CO","CA","OR","CA","OR","CO"),
SalesThisWeek = c(400, 300, 1000, 1200, 800, 200, 600, 100, 150),
SalesLastWeek = c(150, 300, 625, 1150, 700, 50, 200, 80, 125))
df_summarised <-
df %>%
group_by(Item) %>%
summarise(TotSalesChange = sum(SalesThisWeek-SalesLastWeek)) %>%
mutate(ChangeRank = order(TotSalesChange, decreasing = TRUE))
df <-
left_join(df, df_summarised)
Results
> df
Item State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
1 Beer CA 400 150 675 1
2 Beer WA 300 300 675 1
3 Beer OR 1000 625 675 1
4 Beer CO 1200 1150 675 1
5 Wine CA 800 700 250 3
6 Wine OR 200 50 250 3
7 Brandy CA 600 200 445 2
8 Brandy OR 100 80 445 2
9 Brandy CO 150 125 445 2
CodePudding user response:
You may use dense_rank
-
library(dplyr)
df %>%
group_by(Item) %>%
mutate(TotSalesChange = sum(SalesThisWeek-SalesLastWeek)) %>%
ungroup %>%
mutate(ChangeRank = dense_rank(-TotSalesChange))
# Item State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
# <chr> <chr> <dbl> <dbl> <dbl> <int>
#1 Beer CA 400 150 675 1
#2 Beer WA 300 300 675 1
#3 Beer OR 1000 625 675 1
#4 Beer CO 1200 1150 675 1
#5 Wine CA 800 700 250 3
#6 Wine OR 200 50 250 3
#7 Brandy CA 600 200 445 2
#8 Brandy OR 100 80 445 2
#9 Brandy CO 150 125 445 2