Using an example from a related issue: nearest month end in R
library(lubridate)
library(dplyr)
dt<-data.frame(orig_dt=as.Date(c("1997-04-01","1997-06-29")))
dt %>% mutate(round_dt=round_date(orig_dt, unit="month"),
modified_dt=round_date(orig_dt, unit="month")-days(1))
in one session I correctly get the rounded dates (R 4.0.0, Rcpp_1.0.4.6 loaded via a namespace)
orig_dt round_dt modified_dt
1 1997-04-01 1997-04-01 1997-03-31
2 1997-06-29 1997-07-01 1997-06-30
in another session I get floor instead of round (different machine, R 4.0.2, Rcpp not loaded via a namespace)
orig_dt round_dt modified_dt
1 1997-04-01 1997-04-01 1997-03-31
2 1997-06-29 1997-06-01 1997-05-31
I think it could be related to Rcpp , as earlier I got an error message
Error in C_valid_tz(tzone) (rscrpt.R#27): function 'Rcpp_precious_remove' not provided by package 'Rcpp'
Show stack trace
Although I am not getting the error anymore, the values are different and I wonder why/how to fix it without going through complete reinstallation.
CodePudding user response:
I am able to reproduce your issue in a vanilla R session.
$ R --vanilla
> R.version.string
[1] "R version 4.1.2 (2021-11-01)"
> packageVersion("lubridate")
[1] ‘1.8.0’
> library("lubridate")
> round_date(ymd("1997-06-29"), unit = "month")
[1] "1997-06-01"
It seems to be a bug in round_date
, introduced in this commit. Prior to the commit, the body of round_date
contained:
above <- unclass(as.POSIXct(ceiling_date(x, unit = unit, week_start = week_start)))
mid <- unclass(as.POSIXct(x))
below <- unclass(as.POSIXct(floor_date(x, unit = unit, week_start = week_start)))
Here, x = ymd("1997-06-29")
, unit = "month"
, and below
, mid
, and above
are defined as the number of seconds from 1970-01-01 00:00:00 UTC
to the month-floor of x
, x
, and the month-ceiling of x
, respectively (more precisely, time 00:00:00
on those three Dates, in your system's time zone). Thus, below < mid < above
, and round_date
would compare mid-below
to above-mid
to determine which of below
and above
was closer to mid
.
Since the commit, mid
has been defined as
mid <- unclass(x)
which is the number of days from 1970-01-01
to x
. Now, mid << below < above
, making mid-below
negative and above-mid
positive. As a result, round_date
considers below
to be "closer" to mid
than above
, and it incorrectly rounds 1997-06-29
down to 1997-06-01
.
I have reported the issue to the package maintainers here. I imagine that it will be fixed soon...
In the mean time, you can try reverting to an older version of lubridate
, from before the commit, or using this temporary work-around:
round_date_patched <- function(x, unit) {
as.Date(round_date(as.POSIXct(x), unit = unit))
}
round_date_patched(ymd("1997-06-29"), unit = "month") # "1997-07-01"