I need to create a plot that shows the range between the earliest and the latest date for two groups. There are different years, but I am only interested in the dates defined as month-day (i.e. Feb-04) regardless of years. I am able to do that when defining month-day as Julian days, but I'd like to do it on the month-day format (i.e. Feb-04).
This is the code and output I obtained when working this thing in Julian dates:
library(dplyr)
data.1 <-read.csv(text = "
trt,full_date
A,10/06/2020
A,09/19/2017
A,10/28/2014
A,09/02/2016
A,09/19/2017
A,09/26/2017
B,08/24/2020
B,09/24/2020
B,10/16/2018
B,09/16/2018
B,09/15/2016
B,09/09/2018
")
#day of year option
data.2 <- data.1 %>%
mutate(full_date = as.Date(full_date, format("%m/%d/%Y")),
full_date.doy = as.numeric(strftime(full_date, format = "%j"))) %>%
group_by(trt) %>%
summarise(earliest.doy = min(full_date.doy),
latest.doy = max(full_date.doy))
ggplot(data.2)
geom_segment( aes(x=trt, xend=trt, y=earliest.doy, yend=latest.doy), color="grey")
geom_point( aes(x=trt, y=earliest.doy), color=rgb(0.2,0.7,0.1,0.5), size=3 )
geom_point( aes(x=trt, y=latest.doy), color=rgb(0.7,0.2,0.1,0.5), size=3 )
coord_flip()
ylab("Day of the year")
output:
What I would like to have is this (dates on the x axis are approximated:
The first problem I ran into was the calculation of earliest and latest date. For trt="A"
, the earliest and latest dates are wrong.
The issue is that the date_mm.dd
seems to be in character format, and I don't find a way to change to date. That way, the plot is wrong:
Any hint would be really appreciated.
CodePudding user response:
One way to address this could be to take your doy
variables and make them into dates in an arbitrary year like 2022. Here, day one will be one day after 2021-12-31, ie Jan 1 2022.
(2022 is not a leap year, so dates originating after Feb 28 in a leap year will be represented ahead by one day. ie Feb 29, when it occurs, is the 60th day of the year, but in most years, like 2022, March 1 is the 60th day, so it would show up there. Depending on the context, you could potentially adjust for that.)
data.2 %>%
mutate(across(contains("doy"), ~as.Date("2021-12-31") .x))
This is a shortcut to ask dplyr to apply the same function to any column whose name contains the strong "doy". We could equivalently use:
data.2 %>%
mutate(earliest.doy = as.Date("2021-12-31") earliest.doy))
mutate(latest.doy = as.Date("2021-12-31") earliers.doy))
Result
# A tibble: 2 × 3
trt earliest.doy latest.doy
<chr> <date> <date>
1 A 2022-09-03 2022-10-28
2 B 2022-08-25 2022-10-16
then you could feed that into your existing code:
... %>%
ggplot()
geom_segment( aes(x=trt, xend=trt, y=earliest.doy, yend=latest.doy), color="grey")
geom_point( aes(x=trt, y=earliest.doy), color=rgb(0.2,0.7,0.1,0.5), size=3 )
geom_point( aes(x=trt, y=latest.doy), color=rgb(0.7,0.2,0.1,0.5), size=3 )
coord_flip()
ylab("Day of the year")