For a dataframe which has an n variables which provides the frequency what command I need in order to select the top 10 high frequent rows based on n.
An example of the structure of the dataframe is this (unfortunatelly I can't provide the real data as they are to many)
df <- data.frame(status = c("open", "close", "close", "open/close", "close"),
stock = c("google", "amazon", "amazon", "yahoo", "amazon"),
newspaper = c("times", "newyork", "london", "times", "times"))
# Count the number of occurance for each alluvial
df <- df %>% dplyr::group_by(stock, newspaper, status) %>%
summarise(n = n())
CodePudding user response:
If you want to keep the original rows, I recommend you to use add_count
instead
df %>%
add_count(status, stock, newspaper) %>%
slice_max(n, n = 10)
CodePudding user response:
Is that what you need?
library(dplyr)
df %>%
count(status, stock, newspaper, sort = TRUE) %>%
slice_head(n = 10)
and, as suggested by @camille below:
df %>%
count(status, stock, newspaper) %>%
slice_max(n)