How to aggregate count by grouped rows of multiple columns inside pipes?-CodePudding

I want to get the head of the count of grouped rows by multiple columns in ascending order for a plot. I found some answers on the internet but nothing seems to work when I try to merge it with arrange and pipes.

df_Cleaned %>%
  head(arrange(aggregate(df_Cleaned$Distance, 
                 by = list(df_Cleaned$start_station_id, df_Cleaned$end_station_id),
                 FUN = nrow)))) %>%
  ggplot(mapping = aes(x = ride_id, color = member_casual))  
  geom_bar()

it seems to have problems with df_Cleaned$ since it's required in front of each column.

CodePudding user response：

I hope I understood your meaning correctly. If you want to group your data by the columns Distance, start_station_id, and end_station_id and then count how many values there are under each group and then take only the head of those values, then maybe the following code will help using tidyverse:

df_Cleaned %>%
  group_by(Distance, start_station_id, end_station_id) %>%
  count() %>%
  head() %>%

In addition, it seems like you you are later trying to plot using a variable you did not group by, so either you add it to your group_by or choose a different variable to plot by.

CodePudding user response：

We may use add_count to create a count column by 'start_station_id' and 'end_station_id', and sort it, then filter the first 6 unique values (head ) or last 6 (tail) of 'n' and plot on the subset of the data

library(dplyr)
library(ggplot2)
df_Cleaned %>%
    add_count(start_station_id, end_station_id, sort = TRUE) %>%
    filter(n %in% head(unique(n), 6)) %>%
    ggplot(mapping = aes(x = ride_id, color = member_casual))  
     geom_bar()