As I explained in previous posts I'm trying to count observations over 30 days windows grouping by id.
The data:
df<-structure(list(id=c(1,1,1,2),date=c("2001-07-29","2001-08-01","2001-07-20","2001-07-21")))
df<-as.data.frame(df)
df<-df%>%
mutate(date=ymd(date))
The code:
df<-df%>%
group_by(id)%>%
mutate(n=sapply(seq(length(date)),
function(x) sum(between(date[1:x],date[x]-days(30),date[x]))))
The output is wrong because the expected result is:
id date n
1 29/07/2001 2
1 01/08/2001 3
1 20/07/2001 1
2 21/07/2001 1
CodePudding user response:
Something like this?
library(dplyr)
library(lubridate)
df<-structure(list(id=c(1,1,1,2),date=c("2001-07-29","2001-08-01","2001-07-20","2001-07-21")))
df <- as.data.frame(df)
df %>%
mutate(date = ymd(date)) %>%
group_by(id, fl = floor_date(date - days(19), "month") days(19)) %>%
arrange(date, .by_group = T) %>%
mutate(n = row_number())
# A tibble: 4 × 4
# Groups: id, fl [2]
id date fl n
<dbl> <date> <date> <int>
1 1 2001-07-20 2001-07-20 1
2 1 2001-07-29 2001-07-20 2
3 1 2001-08-01 2001-07-20 3
4 2 2001-07-21 2001-07-20 1
CodePudding user response:
If I understand you correctly, you have to arrange
your date first and you can use the following code:
library(lubridate)
library(dplyr)
df %>%
group_by(id) %>%
arrange(date) %>%
mutate(n = sapply(seq(length(date)),
function(x) sum(between(date[1:x], date[x] - days(30), date[x]))))
Output:
# A tibble: 4 × 3
# Groups: id [2]
id date n
<dbl> <date> <int>
1 1 2001-07-20 1
2 2 2001-07-21 1
3 1 2001-07-29 2
4 1 2001-08-01 3