I am trying to examine the distribution of flights from NYC per day for each month of the year. The dataset I'm using is "nycflights13", which can be installed as a package. Then, I transform the data as follow:
# load
flights <- nycflights13::flights
# rid NA
flights <- flights %>% tidyr::drop_na(tailnum)
# filter out only flights going FROM NYC and add month
flights_with_month <- flights %>%
filter(origin != "EWR") %>%
mutate(mth = month(time_hour), label = TRUE)
# calculate flights per day
flights_with_month <- flights_with_month %>%
group_by(mth, day) %>%
mutate(total_daily_flights = n())
# making boxplots
ggplot(flights_with_month, aes(y = total_daily_flights, group = mth))
geom_boxplot()
This is the boxplot I'm getting; you can notice that the x-axis is not the name of the months, and it also seems that they are not ordered from January to December as well.
CodePudding user response:
Your code has two issues. mutate(mth = month(time_hour), label = TRUE)
should be mutate(mth = month(time_hour, label = TRUE))
, and your ggplot should set x = mth
rather than grp = mth
. The data were being plotted in the correct order, but with incorrect labeling.
# load
flights <- nycflights13::flights
# rid NA
flights <- flights %>% tidyr::drop_na(tailnum)
# filter out only flights going FROM NYC and add month
flights_with_month <- flights %>%
filter(origin != "EWR") %>%
mutate(mth = month(time_hour, label = TRUE))
# calculate flights per day
flights_with_month <- flights_with_month %>%
group_by(mth, day) %>%
mutate(total_daily_flights = n())
# making boxplots
ggplot(flights_with_month, aes(y = total_daily_flights, x = mth))
geom_boxplot()