Home > Software engineering >  Calculate the days differences with mixed date format in R
Calculate the days differences with mixed date format in R

Time:01-21

I need to count differences in days between two mixed-structured dates. Here is an example dataset:

testdata <- data.frame(id = c(1,2,3),
                   date1 = c("2022/11/13 9:19:03 AM PST", "2022-11-01","2022-10-28"),
                   date2 = c("2022/12/12 1:52:29 PM PST","2022-10-21","2022/12/01 8:15:25 AM PST"))

> testdata
  id                     date1                     date2
1  1 2022/11/13 9:19:03 AM PST 2022/12/12 1:52:29 PM PST
2  2                2022-11-01                2022-10-21
3  3                2022-10-28 2022/12/01 8:15:25 AM PST

First I need to grab dates, exclude the hours, and calculate the number of days differences. So the expected dataset would be:

> df
      id                     date1                     date2.   days.diff
    1  1                2022/11/13                2022/12/12     19
    2  2                2022-11-01                2022-10-21     11
    3  3                2022-10-28                2022/12/01     34

CodePudding user response:

You could use the anytime package with anytime to calculate the difference in dates rowwise like this:

library(dplyr)
library(anytime)
testdata %>%
  rowwise() %>%
  mutate(days.diff = anytime(date1) - anytime(date2))
#> # A tibble: 3 × 4
#> # Rowwise: 
#>      id date1                     date2                     days.diff     
#>   <dbl> <chr>                     <chr>                     <drtn>        
#> 1     1 2022/11/13 9:19:03 AM PST 2022/12/12 1:52:29 PM PST -29.00000 days
#> 2     2 2022-11-01                2022-10-21                 11.04167 days
#> 3     3 2022-10-28                2022/12/01 8:15:25 AM PST -34.04167 days

Created on 2023-01-20 with reprex v2.0.2

CodePudding user response:

Using as.Date with tryFormats

library(dplyr)

testdata %>% 
  rowwise() %>% 
  mutate(across(starts_with("date"), ~ as.Date(.x, 
      tryFormats=c("%Y/%m/%d %H:%M:%S", "%Y-%m-%d"))), 
    days.diff = date2 - date1) %>% 
  ungroup()
# A tibble: 3 × 4
     id date1      date2      days.diff
  <dbl> <date>     <date>     <drtn>   
1     1 2022-11-13 2022-12-12  29 days 
2     2 2022-11-01 2022-10-21 -11 days 
3     3 2022-10-28 2022-12-01  34 days 
  • Related