I have a table of maximum trip lengths by month which I am trying to graph in R , While trying to graph it, the X-axis does not graph according to the month, instead it graphs it alphabetically
I'm just getting started in R and I used the following code from what one of the videos I watched adjusted for my table names:
max_trips <- read.csv("max_and_min_trips.csv")
ggplot(data=max_trips)
geom_point(mapping = aes(x=month,y=max_trip_duration))
scale_x_month(month_labels = "%Y-%m")
CodePudding user response:
The simple answer is that the data for your "month" column is stored as a vector of strings, not as a date. In R, this data type is called a "character" (or chr
). You can confirm this by typing class(max_trips$month)
. The result is certainly "character"
in your console. Therefore, your solution would be to (1) convert the data type to a date and (2) adjust the formatting of the date on the x axis using scale_x_date
and/or related functions.
I'll demonstrate the process with a simple example dataset and plot. Here's the basic data frame and plot. You'll see, the plot is again arranged "alphabetically" instead of as expected if the mydf$dates
values were stored as dates in "month/year" format.
library(lubridate)
mydf <- data.frame(
dates = c("1/21", "2/20", "12/21", "3/19", "10/19", "9/19"),
yvals = c(13, 31, 14, 10, 20, 18))
ggplot(mydf, aes(x = dates, y = yvals)) geom_point()
Convert to Date
To convert to a date, you can use a few different functions, but I find the lubridate
package particularly useful here. The as_date()
function will be used for the conversion; however, we cannot just apply as_date()
directly to mydf$dates
or we will get the following error in the console:
> as_date(mydf$dates)
[1] NA NA NA NA NA NA
Warning message:
All formats failed to parse. No formats found.
Since there are so many variety of ways you can format data which correspond to dates, date times, etc, we need to specify that our data is in "month/year" format. The other key here is that data setup as a date must specify year, month and day. Our data here is just specifying month and year, so we will first need to add a random "day" to each date before converting. Here's something that works:
mydf$dates <- as_date(
paste0("1/", mydf$dates), # need to add a "day" to correctly format for date
format = "%d/%m/%y" # nomenclature from strptime()
)
The paste0(...)
function serves to add "1/"
before each value in mydf$dates
and then the format =
argument specifies the character values should be read as "day/month/year". For more information on the nomenclature for formats of dates, see the help for the strptime()
function
If the labeling isn't quite what you are looking for, you may check the
In the OP's case, I would suggest the following code should work:
library(lubridate)
max_trips <- read.csv("max_and_min_trips.csv")
max_trips$month <- as_date(
paste0("1/", max_trips$month),
format = "%d/%m/%y")
ggplot(data=max_trips)
geom_point(mapping = aes(x=month,y=max_trip_duration))
scale_x_date(breaks = "1 month", date_labels = "%Y-%m")