In my data frame (as below) I want to extract the top 5 brands by sales. I want to make sure that if my brand appears in top 5 then 5 values get picked, else top 5 row for my brand gets picked (6 in total).
dff4 <- data.frame(stringsAsFactors = FALSE, check.names = FALSE,
Region = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"), Brand = c("B1", "B2", "B3",
"B4", "B5", "B6", "B7", "B8", "B9", "B10", "B11", "B12",
"B13", "B14", "B15"), Sales = c(2923, 2458, 2812, 2286,
1683, 1085, 2805, 3214, 1059, 1866, 3280, 2481, 2016, 1230, 1763))
Then I sort my data on Sales using the below code:
dff4 <- dff4[order(-dff4$Sales), ]
Now I need to only keep the top 5 values by sales in my data frame. This I can do using the below lines.
dff4 <- head(dff4, 5)
However, I also want to make sure that my own brand's data remain in the table, irrespective of it is appearing in top 5 or not. For example when I sort the above table, I can see that Brand "B4" does not appear in top 5. If I want to make sure that my code pick only top 5 if "B4" appears there else pick the top 5 and brand "B4" as the 6th value.
How do I go about it.
CodePudding user response:
You could use slice_max()
to select rows with 5 highest values of Sales
, and then bind the row with Brand
B4 behind. If B4 has been in the top 5 rows, distinct()
will remove the duplicate one.
library(dplyr)
dff4 %>%
slice_max(Sales, n = 5) %>%
bind_rows(filter(dff4, Brand == "B4")) %>%
distinct()
# Region Brand Sales
# 1 C B11 3280
# 2 B B8 3214
# 3 A B1 2923
# 4 A B3 2812
# 5 B B7 2805
# 6 A B4 2286
CodePudding user response:
We could use
library(dplyr)
dff4 %>%
arrange(desc(Sales)) %>%
filter(row_number() <6 |Brand == "B4")
Region Brand Sales
1 C B11 3280
2 B B8 3214
3 A B1 2923
4 A B3 2812
5 B B7 2805
6 A B4 2286