I want to find the Start Station Name which appears the most in the column. I used this code:
Top_Start_Station_Name <- df %>%
count(start_station_name)
However, it appears around 82 pages with a lot of station names along with their occurences. I wonder how I can arrange or sort it for appear the most 5 station name appearances.
Best Regards, Tu Le
CodePudding user response:
Try this: This should give you a plot of the top 5 stations:
library(tidyverse)
df %>%
mutate(start_station_name = fct_lump(start_station_name, n=5)) %>%
count(start_station_name) %>%
mutate(start_station_name = fct_reorder(start_station_name, n)) %>%
ggplot(aes(start_station_name, n))
geom_col()
coord_flip()
CodePudding user response:
the arrange(desc(.))
function composition can sort by counts (named n
) in the tbl_df that results from the count
function:
Top_5_Start_Station_Names <- df %>%
count(start_station_name) %>% # 2 col tbl, first col are names
arrange(desc(n)) %>%
.[1:5,1] # pick first 5 items from first col
If you wanted both the names and counts you could use .[1:5 ]
as the last function. (Only now do I see the more compact and earlier comment by @MrFlick.)