Arrange in R for Frequency in Column-CodePudding

I want to find the Start Station Name which appears the most in the column. I used this code:

Top_Start_Station_Name <- df %>%
  count(start_station_name)

However, it appears around 82 pages with a lot of station names along with their occurences. I wonder how I can arrange or sort it for appear the most 5 station name appearances.

Best Regards, Tu Le

CodePudding user response：

Try this: This should give you a plot of the top 5 stations:

library(tidyverse)
df %>% 
  mutate(start_station_name = fct_lump(start_station_name, n=5)) %>% 
  count(start_station_name) %>% 
  mutate(start_station_name = fct_reorder(start_station_name, n)) %>% 
  ggplot(aes(start_station_name, n)) 
  geom_col() 
  coord_flip()

CodePudding user response：

the arrange(desc(.)) function composition can sort by counts (named n) in the tbl_df that results from the count function:

 Top_5_Start_Station_Names <- df %>% 
                          count(start_station_name) %>% # 2 col tbl, first col are names
                          arrange(desc(n)) %>% 
                          .[1:5,1]    # pick first 5 items from first col

If you wanted both the names and counts you could use .[1:5 ] as the last function. (Only now do I see the more compact and earlier comment by @MrFlick.)