I need to count differences in days between two mixed-structured dates. Here is an example dataset:
testdata <- data.frame(id = c(1,2,3),
date1 = c("2022/11/13 9:19:03 AM PST", "2022-11-01","2022-10-28"),
date2 = c("2022/12/12 1:52:29 PM PST","2022-10-21","2022/12/01 8:15:25 AM PST"))
> testdata
id date1 date2
1 1 2022/11/13 9:19:03 AM PST 2022/12/12 1:52:29 PM PST
2 2 2022-11-01 2022-10-21
3 3 2022-10-28 2022/12/01 8:15:25 AM PST
First I need to grab dates, exclude the hours, and calculate the number of days differences. So the expected dataset would be:
> df
id date1 date2. days.diff
1 1 2022/11/13 2022/12/12 19
2 2 2022-11-01 2022-10-21 11
3 3 2022-10-28 2022/12/01 34
CodePudding user response:
You could use the anytime
package with anytime
to calculate the difference in dates rowwise
like this:
library(dplyr)
library(anytime)
testdata %>%
rowwise() %>%
mutate(days.diff = anytime(date1) - anytime(date2))
#> # A tibble: 3 × 4
#> # Rowwise:
#> id date1 date2 days.diff
#> <dbl> <chr> <chr> <drtn>
#> 1 1 2022/11/13 9:19:03 AM PST 2022/12/12 1:52:29 PM PST -29.00000 days
#> 2 2 2022-11-01 2022-10-21 11.04167 days
#> 3 3 2022-10-28 2022/12/01 8:15:25 AM PST -34.04167 days
Created on 2023-01-20 with reprex v2.0.2
CodePudding user response:
Using as.Date
with tryFormats
library(dplyr)
testdata %>%
rowwise() %>%
mutate(across(starts_with("date"), ~ as.Date(.x,
tryFormats=c("%Y/%m/%d %H:%M:%S", "%Y-%m-%d"))),
days.diff = date2 - date1) %>%
ungroup()
# A tibble: 3 × 4
id date1 date2 days.diff
<dbl> <date> <date> <drtn>
1 1 2022-11-13 2022-12-12 29 days
2 2 2022-11-01 2022-10-21 -11 days
3 3 2022-10-28 2022-12-01 34 days