Home > Blockchain >  How to select customized TOP10 rows by variable?
How to select customized TOP10 rows by variable?

Time:06-20

I have the sales and cost data by models. The code below select TOP10 models by sales and all the rest are sum up in new category "Others" which is 11th row.

df <- data.frame (model  = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N"),
                 sale = c(100,300,140,456,345,456,456,780,40,560,560,456,350,500),
                 cost = c(1340,330,440,443,365,437,478,700,30,460,160,456,650,100))


#TOP10 by sale
order <- df %>%
  type.convert(as.is = TRUE) %>%
  mutate(pos = row_number(desc(sale)),
         model = ifelse(pos>10, 'Others', model),
         pos = ifelse(pos>10, 11, pos)) %>%
  group_by(model, pos) %>%
  summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
  arrange(pos)

Output:

   model    pos  cost  sale
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 E         10   365   345
11 Others    11  2140   580

The sale of category A is put in "other" since it has low a sale (100) and is not in TOP10. Now, I want to include A in this TOP 10 in any case, no matter how much sale it has. So output should be TOP9 'A' 'Others':

Expected output:

   model    pos  cost  sale
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 A         10   100   1340
11 Others    11  2140   580

(Thus change must be done in the given code)

CodePudding user response:

One way could be making use of bind_rows after removing the 10th line and adding only where model == A:

library(tidyverse)
#TOP10 by sale
 df %>%
   type.convert(as.is = TRUE) %>%
   mutate(pos = row_number(desc(sale)),
         model = ifelse(pos>10, 'Others', model),
         pos = ifelse(pos>10, 11, pos)) %>%
   group_by(model, pos) %>%
   summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
   arrange(pos) %>% 
   slice(-10) %>% 
   bind_rows(df %>% 
               filter(model == "A")) %>% 
   mutate(pos = replace_na(pos, 10)) %>% 
   arrange(pos)
   model    pos  cost  sale
   <chr>  <dbl> <dbl> <dbl>
 1 H          1   700   780
 2 J          2   460   560
 3 K          3   160   560
 4 N          4   100   500
 5 D          5   443   456
 6 F          6   437   456
 7 G          7   478   456
 8 L          8   456   456
 9 M          9   650   350
10 A         10  1340   100
11 Others    11  2140   580

CodePudding user response:

You can mutate in 2 steps, before grouping and manipulate the pos variable to fix the order. This solves the problem in the comments to the other answer.

order <- df %>%
  type.convert(as.is = TRUE) %>%
  mutate(pos = row_number(desc(sale))) %>% 
  mutate(pos = ifelse(model == "A" & pos > 10, 11, ifelse(pos > 10, 12, pos)),
         model = ifelse(pos>11, 'Others', model)) %>%
  group_by(model, pos) %>%
  summarise(cost= sum(cost), sale= sum(sale), .groups = 'drop') %>%
  arrange(pos)

  • Related