There are multiple similar questions on this but not the same problem
MWE:
library(dplyr)
library(lubridate)
df= data.frame(id = c(1:5),
type = c("a", "b", "b", "a", "b"),
start = dmy(c("05/05/2005","06/06/2006", "07/07/2007", "08/08/2008", "09/09/2009")),
finish = dmy(c("08/08/2008", "09/09/2009","02/02/2011","02/02/2011", NA)),
not_used = c(F,T,F,T,F))
I want to produce a summary, grouped by type
, including the total number of not_used
by type and the mean difference between start
and finish
in months when not_used
is False. This is how I'm trying:
df%>%group_by(type)%>%
summarise(Not_used = sum(not_used),
`Mean_Lifespan_of_used(months)` = mean((interval(start,finish)/months(1), na.rm= T)[not_used == F]))
With this I'm getting unexpected token ','
, resulting in error:
Error: unexpected ',' in:
" summarise(Not_used = sum(not_used),
`Mean_Lifespan_of_used(months)` = mean((interval(start,finish)/months(1),"
I appreciate I could create a new column before the summarise function, but I'd like to understand what I'm doing wrong here.
CodePudding user response:
The input data as.Date
needs format = "%d/%m/%Y"
and then subset the output before doing the mean
(as mean
returns a single value whereas not_used
length is different)
library(dplyr)
library(lubridate)
df%>%
group_by(type)%>%
summarise(Not_used = sum(not_used),
`Mean_Lifespan_of_used(months)` = mean((interval(start,
finish)/months(1))[not_used == FALSE], na.rm = TRUE))
-output
# A tibble: 2 × 3
type Not_used `Mean_Lifespan_of_used(months)`
<chr> <int> <dbl>
1 a 1 39.1
2 b 1 42.8