I have CSV data where I have two columns and I need to plot Time as x axis and count as y axis. The time in the data ranges from 2008 Sep to 2021 Dec. Data points are monthly.
I want to put 5 certain time points in the x axis like below:
This is what I tried:
library(ggplot2)
theme_set(
theme_bw(),
theme(legend.position = "top")
)
result <- read.csv("Downloads/Questions Trend - Questions Trend.csv")
p <- ggplot(result, aes(result$Time_Formatted, result$VCS_Feature_History_Sanitize_Trnd))
geom_point(size = 0.1) xlab("Month") ylab("Temporal Trend")
p geom_smooth(method = "loess", color = "red")
I tried below and could remove some points but still can not customize to specific points.
library(ggplot2)
library(scales)
theme_set(
theme_bw(),
theme(legend.position = "top")
)
result <- read.csv("Downloads/Questions Trend - Questions Trend.csv")
result$Time_Formatted <- as.Date(result$Time_Formatted)
p <- ggplot(result, aes(result$Time_Formatted, result$VCS_Feature_History_Sanitize_Trnd))
geom_point(size = 0.1) xlab("Month") ylab("Temporal Trend")
scale_x_date(date_breaks = "years" , date_labels = "%b-%y")
p geom_smooth(method = "loess", color = "red")
How to give specific points in the x axis?
CodePudding user response:
You should use the scale_x_date()
function of the ggplot2
package.
For example, here a code I'm always using in my work when I need to plot time data :
ggplot2::scale_x_date(name = " ",
breaks = function(date) seq.Date(from = lubridate::ymd("2020-01-01") 1,
to = lubridate::today(),
by = "1 month"),
limits = c(lubridate::ymd("2020-07-13"),
lubridate::today()),
expand = c(0,0),
labels = scales::date_format("%b %Y"))
With breaks
, you can choose to only show the first date of each month.
With labels
and the function date_format()
from the scales
package, you can choose the date format, and you can basically do anything you want. Here, I choose to plot the month in letters and the year in number.
CodePudding user response:
Most of your problems here are related to reading in the date data so that the format is correctly recognised - this can be done by specifying it explicitly:
result <- read.csv("Test.csv")
result$Time_Formatted <- as.Date(result$Time_Formatted, "%m/%d/%y")
Then it's simply a case of making a vector indicating where you want breaks and specifying this in the scale_x_date
with:
date_breaks <- as.Date(c("1/7/10", "1/12/12", "1/1/14", "1/2/15", "1/3/16"), "%d/%m/%y")
p <- ggplot(result, aes(Time_Formatted, VCS_Feature_History_Sanitize_Trnd))
geom_point(size = 0.1) xlab("Month") ylab("Temporal Trend")
scale_x_date(breaks=date_breaks , date_labels = "%b-%y")
p geom_smooth(method = "loess", color = "red")
Note that I've removed the explicit reference to "result" in the aes()
function as that's unnecessary and depricated according to the warning it creates. The end result is: