Home > OS >  Calculate the mean number of days between dates per group
Calculate the mean number of days between dates per group

Time:06-11

My data are as follows:

year group date 
2019 A     2019-07-15
2019 A     2019-07-25
2019 A     2019-08-01
2019 B     2019-07-15
2019 B     2019-07-30
2020 A     2020-08-01
2020 A     2020-08-03
2020 B     2020-08-01
2020 B     2020-08-20
2020 B     2020-08-25

I would like to calculate the mean number of days between dates per year per group. I have tried the following code and receive the following error:

data_meandays <- data %>%
  group_by(year, group)%>% 
  mutate(Difference = date - lag(date)) %>%
  summarize(mean_time = mean(Difference, na.rm=TRUE))

Error in date - lag(date) : 
  non-numeric argument to binary operator

The class of my date column is Date.

Thank you in advance!

CodePudding user response:

The error occurred because the date column is character and not Date class. We need to convert to Date class before doing the difference

library(dplyr)
data %>%
   mutate(date = as.Date(date)) %>% 
   group_by(year, group) %>% 
   mutate(Difference = date - lag(date)) %>% 
   summarize(mean_time = mean(Difference, na.rm=TRUE), .groups = 'drop')

-output

# A tibble: 4 × 3
   year group mean_time
  <int> <chr> <drtn>   
1  2019 A      8.5 days
2  2019 B     15.0 days
3  2020 A      2.0 days
4  2020 B     12.0 days

NOTE: the output from the difference between dates are difftime objects. If we want to convert to numeric class, it would be as.numeric applied on the column


The OP's error can be reproduced if we don't convert to Date class

data %>%  
  group_by(year, group)%>%  
  mutate(Difference = date - lag(date)) %>%  
  summarize(mean_time = mean(Difference, na.rm=TRUE))

Error in mutate(): ! Problem while computing Difference = date - lag(date). ℹ The error occurred in group 1: year = 2019, group = "A". Caused by error in date - lag(date): ! non-numeric argument to binary operator Run rlang::last_error() to see where the error occurred

data

data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L, 
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B", 
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25", 
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03", 
"2020-08-01", "2020-08-20", "2020-08-25")), 
class = "data.frame", row.names = c(NA, 
-10L))
  •  Tags:  
  • r
  • Related