I have 4 datasets from 4 rounds of a survey, with the first round containing 5 variables and the next ones containing only 3. This is because the ID (same sample) and the other two variables (v1 and v2) are fixed over time.
df1 <- data.frame(id = c(1:5), round=1, v1 = c(6:10), v2 = c(11:15), v3=c(16:20))
df2 <- data.frame(id = c(1:5), round=2, v3=c(26:30))
df3 <- data.frame(id = c(1:5), round=3, v3=c(36:40))
df4 <- data.frame(id = c(1:5), round=4, v3=c(46:50))
** rbind
list(df1, df2, df3, df4) %>%
bind_rows(.id = 'grp') %>%
group_by(id)
Now when I rbind
them, I end up with missing rows for the two fixed variables for rounds 1 to 3:
grp id round v1 v2 v3
<chr> <int> <dbl> <int> <int> <int>
1 1 1 1 6 11 16
2 1 2 1 7 12 17
3 1 3 1 8 13 18
4 1 4 1 9 14 19
5 1 5 1 10 15 20
6 2 1 2 NA NA 26
7 2 2 2 NA NA 27
8 2 3 2 NA NA 28
9 2 4 2 NA NA 29
10 2 5 2 NA NA 30
11 3 1 3 NA NA 36
12 3 2 3 NA NA 37
13 3 3 3 NA NA 38
14 3 4 3 NA NA 39
15 3 5 3 NA NA 40
16 4 1 4 NA NA 46
17 4 2 4 NA NA 47
18 4 3 4 NA NA 48
19 4 4 4 NA NA 49
20 4 5 4 NA NA 50
but I need v1 and v2 to be filled for the next rounds as well by matching the respective ID.
Please let me know if there is any way to do this in R (or in Python).
Thank you.
CodePudding user response:
list(df1, df2, df3, df4) %>%
bind_rows(.id = 'grp') %>%
group_by(id) %>%
fill(v1:v3) # from tidyr
#fill(4:6) # alternative syntax: columns 4-6
#fill(-c(1:3)) # alternative syntax: everything except columns 1:3
#fill(everything()) # alternative syntax: fill NAs in all columns
grp id round v1 v2 v3
<chr> <int> <dbl> <int> <int> <int>
1 1 1 1 6 11 16
2 1 2 1 7 12 17
3 1 3 1 8 13 18
4 1 4 1 9 14 19
5 1 5 1 10 15 20
6 2 1 2 6 11 26
7 2 2 2 7 12 27
8 2 3 2 8 13 28
9 2 4 2 9 14 29
10 2 5 2 10 15 30
11 3 1 3 6 11 36
12 3 2 3 7 12 37
13 3 3 3 8 13 38
14 3 4 3 9 14 39
15 3 5 3 10 15 40
16 4 1 4 6 11 46
17 4 2 4 7 12 47
18 4 3 4 8 13 48
19 4 4 4 9 14 49
20 4 5 4 10 15 50