I want to get the head of the count of grouped rows by multiple columns in ascending order for a plot. I found some answers on the internet but nothing seems to work when I try to merge it with arrange and pipes.
df_Cleaned %>%
head(arrange(aggregate(df_Cleaned$Distance,
by = list(df_Cleaned$start_station_id, df_Cleaned$end_station_id),
FUN = nrow)))) %>%
ggplot(mapping = aes(x = ride_id, color = member_casual))
geom_bar()
it seems to have problems with df_Cleaned$
since it's required in front of each column.
CodePudding user response:
I hope I understood your meaning correctly. If you want to group your data by the columns Distance
, start_station_id
, and end_station_id
and then count how many values there are under each group and then take only the head of those values, then maybe the following code will help using tidyverse:
df_Cleaned %>%
group_by(Distance, start_station_id, end_station_id) %>%
count() %>%
head() %>%
In addition, it seems like you you are later trying to plot using a variable you did not group by, so either you add it to your group_by or choose a different variable to plot by.
CodePudding user response:
We may use add_count
to create a count column by 'start_station_id' and 'end_station_id', and sort
it, then filter
the first 6 unique values (head
) or last 6 (tail
) of 'n' and plot on the subset of the data
library(dplyr)
library(ggplot2)
df_Cleaned %>%
add_count(start_station_id, end_station_id, sort = TRUE) %>%
filter(n %in% head(unique(n), 6)) %>%
ggplot(mapping = aes(x = ride_id, color = member_casual))
geom_bar()