I prepared a reproducible example with an example of dataset here:
patient <- c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
month <- rep (1:10, 2)
fev1 <- c(58, NA, NA, NA, 57, NA, NA, NA, NA, 60, NA, NA, NA, NA, 32, NA, NA, NA, NA, 40)
adherence <- c (30, 32, 34, 36, 34, 32, 30, 34, 32, 36, 70, 65, 75, 70, 70, 55, 50, 65, 70, 70)
data <- tibble(patient, month, fev1, adherence)
data
I would like to obtain a 5th column, called mean adherence, which for each fev1 value which is not NA, would provide the mean adherence of the previous values since the last fev1 value, including the adherence value corresponding to the preceding fev1 value, and excluding the adherence value corresponding to the current fev1 value, and this for each patient id.
For example, for patient 1, at month 5, for fev1 = 57, it would compute the mean of (30, 32, 34, 36); then, for fev1 = 60, it would compute the mean of (34, 32, 30, 34, 32)
Thank you very much for your help
CodePudding user response:
We could create a grouping variable based on the NA values in fev1, and then get the mean
by group
library(dplyr)
data %>%
group_by(patient) %>%
mutate(lagadher = lag(adherence),
grp = lag(cumsum(!is.na(fev1)))) %>%
group_by(grp, .add = TRUE) %>%
mutate(Mean_adhere = mean(lagadher) * NA^(is.na(fev1))) %>%
ungroup %>%
select(-grp, -lagadher)
-output
# A tibble: 20 × 5
patient month fev1 adherence Mean_adhere
<dbl> <int> <dbl> <dbl> <dbl>
1 1 1 58 30 NA
2 1 2 NA 32 NA
3 1 3 NA 34 NA
4 1 4 NA 36 NA
5 1 5 57 34 33
6 1 6 NA 32 NA
7 1 7 NA 30 NA
8 1 8 NA 34 NA
9 1 9 NA 32 NA
10 1 10 60 36 32.4
11 2 1 NA 70 NA
12 2 2 NA 65 NA
13 2 3 NA 75 NA
14 2 4 NA 70 NA
15 2 5 32 70 70
16 2 6 NA 55 NA
17 2 7 NA 50 NA
18 2 8 NA 65 NA
19 2 9 NA 70 NA
20 2 10 40 70 62