I have just started to learn data analysis, I am working on the cyclistic case study in the google coursera data analytics program. I am trying to make a geom_col but I am not able to.
First, I created a new column called "route" by merging start station name and end station name Now there are two types of users "casual" and "members". I want to make geom_col of the top 10 routes which are common to both with casual and members trips shown side by side for easier comparison.
So I want a geom_col with route on the y axis, frequency of trips by members and casual users in the axis, with members and casual side by side for ease of comparison.
This is a sample of the original data set with only the relevant columns:
ride_id route member_casual
1 A member
2 A casual
3 B member
4 C casual
5 D casual
There are 500 approx routes
CodePudding user response:
You can use the count()
function to count the frequency for each route
and member_casual
combination. Adding the name=
parameter lets you name the new column with the counts. slice_max()
can then order the rows by the new frequency column that we called freq
, then select the top n (10) rows. The default behavior for slice_max()
is to keep ties, so you may get more than 10 results if the 10th freq
value has a tie. You can add the parameter with_ties = FALSE
to only return the top 10 rows, ignoring any ties that the freq
value in the 10th row may have. Then you plot the route
and freq
and tell ggplot
to fill the bar colors using the member_casual
values. Add the position="dodge"
parameter to geom_col
(the default is "stack") so the member_casual
bars show side by side for easier comparison. Then the coord_flip
flips the graph to be a horizontal column plot.
library(tidyverse)
dataframe |>
count(route, member_casual, name="freq") |>
slice_max(order_by = freq, n = 10) |>
ggplot(aes(route, freq, fill=member_casual))
geom_col(position="dodge")
coord_flip()
Also, if you're looking for the top 10 routes for each type of member (ie. top 10 routes for casual and top 10 routes for member, you can add a group_by(member_casual)
before slice_max()
.
library(tidyverse)
dataframe |>
count(route, member_casual, name="freq") |>
group_by(member_casual) |>
slice_max(order_by = freq, n = 10) |>
ggplot(aes(route, freq, fill=member_casual))
geom_col(position="dodge")
coord_flip()