Home > front end >  Order ggplot geom_point in order by date on X axis
Order ggplot geom_point in order by date on X axis

Time:11-09

I have a table of maximum trip lengths by month which I am trying to graph in R , enter image description here While trying to graph it, the X-axis does not graph according to the month, instead it graphs it alphabetically enter image description here

I'm just getting started in R and I used the following code from what one of the videos I watched adjusted for my table names:

max_trips <- read.csv("max_and_min_trips.csv")

ggplot(data=max_trips) 
  geom_point(mapping = aes(x=month,y=max_trip_duration)) 
  scale_x_month(month_labels = "%Y-%m")

CodePudding user response:

The simple answer is that the data for your "month" column is stored as a vector of strings, not as a date. In R, this data type is called a "character" (or chr). You can confirm this by typing class(max_trips$month). The result is certainly "character" in your console. Therefore, your solution would be to (1) convert the data type to a date and (2) adjust the formatting of the date on the x axis using scale_x_date and/or related functions.

I'll demonstrate the process with a simple example dataset and plot. Here's the basic data frame and plot. You'll see, the plot is again arranged "alphabetically" instead of as expected if the mydf$dates values were stored as dates in "month/year" format.

library(lubridate)
mydf <- data.frame(
  dates = c("1/21", "2/20", "12/21", "3/19", "10/19", "9/19"),
  yvals = c(13, 31, 14, 10, 20, 18))
ggplot(mydf, aes(x = dates, y = yvals))   geom_point()

enter image description here

Convert to Date

To convert to a date, you can use a few different functions, but I find the lubridate package particularly useful here. The as_date() function will be used for the conversion; however, we cannot just apply as_date() directly to mydf$dates or we will get the following error in the console:

> as_date(mydf$dates)
[1] NA NA NA NA NA NA
Warning message:
All formats failed to parse. No formats found. 

Since there are so many variety of ways you can format data which correspond to dates, date times, etc, we need to specify that our data is in "month/year" format. The other key here is that data setup as a date must specify year, month and day. Our data here is just specifying month and year, so we will first need to add a random "day" to each date before converting. Here's something that works:

mydf$dates <- as_date(
  paste0("1/", mydf$dates),  # need to add a "day" to correctly format for date
  format = "%d/%m/%y"        # nomenclature from strptime()
)

The paste0(...) function serves to add "1/" before each value in mydf$dates and then the format = argument specifies the character values should be read as "day/month/year". For more information on the nomenclature for formats of dates, see the help for the strptime() function enter image description here

If the labeling isn't quite what you are looking for, you may check the enter image description here

In the OP's case, I would suggest the following code should work:

library(lubridate)

max_trips <- read.csv("max_and_min_trips.csv")
max_trips$month <- as_date(
  paste0("1/", max_trips$month),
  format = "%d/%m/%y")

ggplot(data=max_trips) 
  geom_point(mapping = aes(x=month,y=max_trip_duration)) 
  scale_x_date(breaks = "1 month", date_labels = "%Y-%m")
  • Related