I have the following dataframe, representing a subject's ID, the number of months before a follow up, and the subject's age.
df1<-structure(list(USUBJID = c(1, 2, 3),
follow_up = c(24,36,56),
AGE = c(65,34,65)),
row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
# A tibble: 3 x 3
USUBJID follow_up AGE
<dbl> <dbl> <dbl>
1 1 24 65
2 2 36 34
3 3 56 65
For each subject, I need to create annual entries based on the value of the follow up column (e.g. if the follow up is 36 months I need entries for 0, 12, 24, and 36 months.) For each of these entries I also need to calculate the subject's age, adding to the value from the original age column.
This is my desired output:
# A tibble: 12 x 3
USUBJID Month AGE
<dbl> <dbl> <dbl>
1 1 0 65
2 1 12 66
3 1 24 67
4 2 0 34
5 2 12 35
6 2 24 36
7 2 36 37
8 3 0 65
9 3 12 66
10 3 24 67
11 3 36 68
12 3 48 69
CodePudding user response:
Not very clear about the conditions. This may help - replicate the rows (uncount
from tidyr
) based on the value in 'follow_up' column, grouped by 'USUBJID', create a seq
uence from 0 with increments of 12 and 'AGE' incrememented by 1 (using row_number
as sequence)
library(dplyr)
library(tidyr)
df2 <- df1 %>%
uncount(follow_up %/% 12 1) %>%
group_by(USUBJID) %>%
mutate(follow_up = seq(0, length.out = n(), by = 12),
AGE = first(AGE) row_number() - 1) %>%
ungroup %>%
rename(Month = follow_up)
-output
df2
# A tibble: 12 × 3
USUBJID Month AGE
<int> <dbl> <dbl>
1 1 0 65
2 1 12 66
3 1 24 67
4 2 0 34
5 2 12 35
6 2 24 36
7 2 36 37
8 3 0 65
9 3 12 66
10 3 24 67
11 3 36 68
12 3 48 69
Or using data.table
library(data.table)
setDT(df1)[rep(seq_len(.N), follow_up %/% 12 1)][,
.(Month = seq(0, length.out = .N, by = 12),
AGE = first(AGE) seq_len(.N) - 1), .(USUBJID)]
-output
USUBJID Month AGE
<num> <num> <num>
1: 1 0 65
2: 1 12 66
3: 1 24 67
4: 2 0 34
5: 2 12 35
6: 2 24 36
7: 2 36 37
8: 3 0 65
9: 3 12 66
10: 3 24 67
11: 3 36 68
12: 3 48 69
data
df1 <- structure(list(USUBJID = 1:3, follow_up = c(24, 36, 56), AGE = c(65,
34, 65)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))