My data are as follows:
year group date
2019 A 2019-07-15
2019 A 2019-07-25
2019 A 2019-08-01
2019 B 2019-07-15
2019 B 2019-07-30
2020 A 2020-08-01
2020 A 2020-08-03
2020 B 2020-08-01
2020 B 2020-08-20
2020 B 2020-08-25
I would like to calculate the mean number of days between dates per year per group. I have tried the following code and receive the following error:
data_meandays <- data %>%
group_by(year, group)%>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE))
Error in date - lag(date) :
non-numeric argument to binary operator
The class of my date column is Date.
Thank you in advance!
CodePudding user response:
The error occurred because the date
column is character
and not Date
class. We need to convert to Date
class before doing the difference
library(dplyr)
data %>%
mutate(date = as.Date(date)) %>%
group_by(year, group) %>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE), .groups = 'drop')
-output
# A tibble: 4 × 3
year group mean_time
<int> <chr> <drtn>
1 2019 A 8.5 days
2 2019 B 15.0 days
3 2020 A 2.0 days
4 2020 B 12.0 days
NOTE: the output from the difference between date
s are difftime
objects. If we want to convert to numeric
class, it would be as.numeric
applied on the column
The OP's error can be reproduced if we don't convert to Date
class
data %>%
group_by(year, group)%>%
mutate(Difference = date - lag(date)) %>%
summarize(mean_time = mean(Difference, na.rm=TRUE))
Error in
mutate()
: ! Problem while computingDifference = date - lag(date)
. ℹ The error occurred in group 1: year = 2019, group = "A". Caused by error indate - lag(date)
: ! non-numeric argument to binary operator Runrlang::last_error()
to see where the error occurred
data
data <- structure(list(year = c(2019L, 2019L, 2019L, 2019L, 2019L, 2020L,
2020L, 2020L, 2020L, 2020L), group = c("A", "A", "A", "B", "B",
"A", "A", "B", "B", "B"), date = c("2019-07-15", "2019-07-25",
"2019-08-01", "2019-07-15", "2019-07-30", "2020-08-01", "2020-08-03",
"2020-08-01", "2020-08-20", "2020-08-25")),
class = "data.frame", row.names = c(NA,
-10L))