Home > Enterprise >  Ranking with dplyr, item on multiple rows due to certain columns, how do I give same rank to same it
Ranking with dplyr, item on multiple rows due to certain columns, how do I give same rank to same it

Time:07-28

So this seems like it should be simple but I haven't been able to sort it out. Basically I have a database where the item is on multiple rows because it is sold it multiple states, and I want to rank the items sales increase by the total sales increase of the item (aggregated across states).

Repro data:

df = data.frame(Item = c("Beer","Beer","Beer","Beer","Wine","Wine","Brandy","Brandy","Brandy"),
            State = c("CA","WA","OR","CO","CA","OR","CA","OR","CO"),
            SalesThisWeek = c(400, 300, 1000, 1200, 800, 200, 600, 100, 150),
            SalesLastWeek = c(150, 300, 625, 1150, 700, 50, 200, 80, 125))

df<-df %>% group_by(Item) %>% 
  mutate(TotSalesChange=sum(SalesThisWeek-SalesLastWeek)) %>%
  mutate(ChangeRank=order(TotSalesChange, decreasing = TRUE))

You end up with:

Item    State   SalesThisWeek   SalesLastWeek   TotSalesChange ChangeRank
Beer    CA           400             150            675         1
Beer    WA           300             300            675         2
Beer    OR          1000             625            675         3
Beer    CO          1200            1150            675         4
Wine    CA           800             700            250         1
Wine    OR           200              50            250         2
Brandy  CA           600             200            445         1
Brandy  OR           100              80            445         2
Brandy  CO           150             125            445         3

But what I want would be to end up with the rankings applied to the Item rather than the item within the state... so something like:

Item    State   SalesThisWeek   SalesLastWeek   TotSalesChange  ChangeRank
Beer    CA           400             150            675         1
Beer    WA           300             300            675         1
Beer    OR          1000             625            675         1
Beer    CO          1200            1150            675         1
Wine    CA           800             700            250         3
Wine    OR           200              50            250         3
Brandy  CA           600             200            445         2
Brandy  OR           100              80            445         2
Brandy  CO           150             125            445         2

because this is actually ranking the total aggregated sales for the item. I'm not sure how to rank the TotSalesChange to allow for duplication of ranking number, I guess. Any help is much appreciated!

CodePudding user response:

This is a kludgy answer but it'll work. We just need to summarise by alcohol type and then join the summary back into the df.

df <- data.frame(
  Item = c("Beer","Beer","Beer","Beer","Wine","Wine","Brandy","Brandy","Brandy"),
  State = c("CA","WA","OR","CO","CA","OR","CA","OR","CO"),
  SalesThisWeek = c(400, 300, 1000, 1200, 800, 200, 600, 100, 150),
  SalesLastWeek = c(150, 300, 625, 1150, 700, 50, 200, 80, 125))

df_summarised <-
  df %>% 
  group_by(Item) %>% 
  summarise(TotSalesChange = sum(SalesThisWeek-SalesLastWeek)) %>% 
  mutate(ChangeRank = order(TotSalesChange, decreasing = TRUE))

df <- 
  left_join(df, df_summarised)

Results

> df
    Item State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
1   Beer    CA           400           150            675          1
2   Beer    WA           300           300            675          1
3   Beer    OR          1000           625            675          1
4   Beer    CO          1200          1150            675          1
5   Wine    CA           800           700            250          3
6   Wine    OR           200            50            250          3
7 Brandy    CA           600           200            445          2
8 Brandy    OR           100            80            445          2
9 Brandy    CO           150           125            445          2

CodePudding user response:

You may use dense_rank -

library(dplyr)

df %>%
  group_by(Item) %>%
  mutate(TotSalesChange = sum(SalesThisWeek-SalesLastWeek)) %>%
  ungroup %>%
  mutate(ChangeRank = dense_rank(-TotSalesChange))

#  Item   State SalesThisWeek SalesLastWeek TotSalesChange ChangeRank
#  <chr>  <chr>         <dbl>         <dbl>          <dbl>      <int>
#1 Beer   CA              400           150            675          1
#2 Beer   WA              300           300            675          1
#3 Beer   OR             1000           625            675          1
#4 Beer   CO             1200          1150            675          1
#5 Wine   CA              800           700            250          3
#6 Wine   OR              200            50            250          3
#7 Brandy CA              600           200            445          2
#8 Brandy OR              100            80            445          2
#9 Brandy CO              150           125            445          2
  • Related