Calculate the sum of responses within the past year for each ID-CodePudding

My data looks like this:

 job    id     date         response
 ny45   088    2021-11-19   1         
 ny21   088    2021-10-12   0
 ny22   088    2021-10-12   0
 ny23   017    2020-09-16   0
 ny23   014    2020-09-16   1
 ny90   017    2020-07-15   1 
 ny67   017    2020-06-02   1

I want to calculate two things:

(1) The number of occurrences for each id for each date in the past year. If the same id has the same date, I would like this to count at 0 past occurrence (as it appears on the same day). Something like this:

 job    id     date         response  occurrences
 ny45   088    2021-11-19   1         2
 ny21   088    2021-10-12   0         0     
 ny22   088    2021-10-12   0         0
 ny23   017    2020-09-16   0         2
 ny23   014    2020-09-16   1         0
 ny90   017    2020-07-15   1         1
 ny67   017    2020-06-02   1         0

(2) The number of responses (response == 1) within the past year. Again, if the same id has the same date, then in this case I would like this to count as 0. I also want to only count previous responses (i.e. not for the current date). Like this:

 job    id     date         response  occurrences  response_count
 ny45   088    2021-11-19   1         2            0    
 ny21   088    2021-10-12   0         0            0
 ny22   088    2021-10-12   0         0            0
 ny23   017    2020-09-16   0         2            2
 ny23   014    2020-09-16   1         0            0
 ny90   017    2020-07-15   1         1            1
 ny67   017    2020-06-02   1         0            0

Any help would be greatly appreciated! Thank you.

CodePudding user response：

Here is one way to do this -

library(tidyverse)
library(lubridate)

df %>%
  mutate(date = as.Date(date)) %>%
  group_by(id) %>%
  mutate(data = map(date, ~list(
          occurrences = sum(between(date, .x - years(1), .x - 1)), 
          response_count = sum(response[between(date, .x - years(1),.x - 1)])))) %>%
  ungroup %>%
  unnest_wider(data)
         

#   job     id date       response occurrences response_count
#  <chr> <int> <date>        <int>       <int>          <int>
#1 ny45     88 2021-11-19        1           2              0
#2 ny21     88 2021-10-12        0           0              0
#3 ny22     88 2021-10-12        0           0              0
#4 ny23     17 2020-09-16        0           2              2
#5 ny23     14 2020-09-16        1           0              0
#6 ny90     17 2020-07-15        1           1              1
#7 ny67     17 2020-06-02        1           0              0

data*

df <- structure(list(job = c("ny45", "ny21", "ny22", "ny23", "ny23", 
"ny90", "ny67"), id = c(88L, 88L, 88L, 17L, 14L, 17L, 17L),date = c("2021-11-19",
"2021-10-12", "2021-10-12", "2020-09-16", "2020-09-16", "2020-07-15", 
"2020-06-02"), response = c(1L, 0L, 0L, 0L, 1L, 1L, 1L)), 
class = "data.frame", row.names = c(NA, -7L))

CodePudding user response：

Another solution

library(lubridate)
df$date=as_datetime(df$date)

cbind(
  df,
  t(
    sapply(1:nrow(df),function(i){
      tmp1=time_length(difftime(df$date[i],df$date),"years")
      tmp2=df$id==df$id[i] & tmp1>0 & tmp1<1
      data.frame(
        "occurences"=sum(tmp2),
        "response_count"=sum(df$response[tmp2])
      )
    })
  )
)

   job id       date response occurences response_count
1 ny45 88 2021-11-19        1          2              0
2 ny21 88 2021-10-12        0          0              0
3 ny22 88 2021-10-12        0          0              0
4 ny23 17 2020-09-16        0          2              2
5 ny23 14 2020-09-16        1          0              0
6 ny90 17 2020-07-15        1          1              1
7 ny67 17 2020-06-02        1          0              0