How to calculate average number of days between series of dates?-CodePudding

I have a dataset with a series of dates listed for each variable. I want to calculate the average number of days for each variable. Ideally, I would get a numeric value that would have the average.

For example, VarA would be 1 because there is an average of one day between each value. Each variable has the dates in order (ex. VarA has the most recent date last).

How would I do this in R?

Example of my dataset:

Variable	Date1	Date2	Date3	Date 4
VarA	09/01/21	09/02/21	09/03/21	09/04/21
VarB	08/01/21	08/17/21	09/02/21
VarC	09/25/21

CodePudding user response：

I have a package on github which makes it easy to apply functions on pairs of variables.

In this case we can use dplyover::across2 to get the difference between Date2-Date1, Date3-Date2 and Date4-Date3. Then all we need is rowMeans in which we wrap the call to across2.

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover/

# approach
dat %>% 
   mutate(diff = rowMeans(
                  across2(Date1:Date3, # .x
                          Date2:Date4, # .y
                           ~ as.numeric(.y - .x)),
                  na.rm = TRUE)
         )

# result
#> # A tibble: 3 x 6
#>   Variable Date1      Date2      Date3      Date4       diff
#>   <chr>    <date>     <date>     <date>     <date>     <dbl>
#> 1 VarA     2021-09-01 2021-09-02 2021-09-03 2021-09-04     1
#> 2 VarB     2021-08-01 2021-08-17 2021-09-02 NA            16
#> 3 VarC     2021-09-25 NA         NA         NA           NaN


# the preprocessed data
dat <- read.table(header = TRUE, text =
  "Variable Date1      Date2        Date3          Date4
   VarA '09/01/21'  '09/02/21'  '09/03/21'  '09/04/21'
   VarB '08/01/21'  '08/17/21'  '09/02/21'  ''
   VarC '09/25/21' ''  '' '' ")


dat <- dat %>% as_tibble %>% 
  mutate(across(starts_with("Date"),
                ~ lubridate::mdy(.x)))

^{Created on 2021-09-25 by the reprex package (v0.3.0)}

Alternatively, we can stick with only {dplyr} by subtracting two across calls from each other and wrapping them in rowMeans.

library(dplyr)

# approach
dat %>% 
   mutate(diff = rowMeans(
      across(Date2:Date4, as.numeric) - across(Date1:Date3, as.numeric),
      na.rm = TRUE))

# result
#> # A tibble: 3 x 6
#>   Variable Date1      Date2      Date3      Date4       diff
#>   <chr>    <date>     <date>     <date>     <date>     <dbl>
#> 1 VarA     2021-09-01 2021-09-02 2021-09-03 2021-09-04     1
#> 2 VarB     2021-08-01 2021-08-17 2021-09-02 NA            16
#> 3 VarC     2021-09-25 NA         NA         NA           NaN

^{Created on 2021-09-25 by the reprex package (v0.3.0)}

CodePudding user response：

You can take rowwise differences of 'Date' columns and take the average.

library(dplyr)

df %>%
  mutate(across(starts_with('Date'), as.Date, '%m/%d/%y')) %>%
  rowwise() %>%
  mutate(diff = as.numeric(mean(diff(c_across(starts_with('Date'))), na.rm = TRUE)))

#  Variable Date1      Date2      Date3      Date4       diff
#  <chr>    <date>     <date>     <date>     <date>     <dbl>
#1 VarA     2021-09-01 2021-09-02 2021-09-03 2021-09-04     1
#2 VarB     2021-08-01 2021-08-17 2021-09-02 NA            16
#3 VarC     2021-09-25 NA         NA         NA           NaN

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(Variable = c("VarA", "VarB", "VarC"), Date1 = c("09/01/21", 
"08/01/21", "09/25/21"), Date2 = c("09/02/21", "08/17/21", ""
), Date3 = c("09/03/21", "09/02/21", ""), Date4 = c("09/04/21", 
"", "")), class = "data.frame", row.names = c(NA, -3L))