Keep the top rows from a dataframe-CodePudding

For a dataframe which has an n variables which provides the frequency what command I need in order to select the top 10 high frequent rows based on n.

An example of the structure of the dataframe is this (unfortunatelly I can't provide the real data as they are to many)

df <- data.frame(status = c("open", "close", "close", "open/close", "close"), 
                 stock = c("google", "amazon", "amazon", "yahoo", "amazon"), 
                 newspaper = c("times", "newyork", "london", "times", "times"))

# Count the number of occurance for each alluvial
df <- df %>% dplyr::group_by(stock, newspaper, status) %>% 
  summarise(n = n())

CodePudding user response：

If you want to keep the original rows, I recommend you to use add_count instead

df %>% 
  add_count(status, stock, newspaper) %>% 
  slice_max(n, n = 10)

CodePudding user response：

Is that what you need?

library(dplyr)

df %>% 
  count(status, stock, newspaper, sort = TRUE) %>% 
  slice_head(n = 10)

and, as suggested by @camille below:

df %>% 
  count(status, stock, newspaper) %>% 
  slice_max(n)