Home > Net >  rbind dataframes by filling missing rows from the first dataframe
rbind dataframes by filling missing rows from the first dataframe

Time:08-05

I have 4 datasets from 4 rounds of a survey, with the first round containing 5 variables and the next ones containing only 3. This is because the ID (same sample) and the other two variables (v1 and v2) are fixed over time.

df1 <- data.frame(id = c(1:5),  round=1, v1 = c(6:10), v2 = c(11:15), v3=c(16:20))
df2 <- data.frame(id = c(1:5),  round=2, v3=c(26:30))
df3 <- data.frame(id = c(1:5),  round=3, v3=c(36:40))
df4 <- data.frame(id = c(1:5),  round=4, v3=c(46:50))

** rbind

list(df1, df2, df3, df4) %>% 
bind_rows(.id = 'grp') %>% 
group_by(id) 

Now when I rbind them, I end up with missing rows for the two fixed variables for rounds 1 to 3:

   grp      id round    v1    v2    v3
   <chr> <int> <dbl> <int> <int> <int>
 1 1         1     1     6    11    16
 2 1         2     1     7    12    17
 3 1         3     1     8    13    18
 4 1         4     1     9    14    19
 5 1         5     1    10    15    20
 6 2         1     2    NA    NA    26
 7 2         2     2    NA    NA    27
 8 2         3     2    NA    NA    28
 9 2         4     2    NA    NA    29
10 2         5     2    NA    NA    30
11 3         1     3    NA    NA    36
12 3         2     3    NA    NA    37
13 3         3     3    NA    NA    38
14 3         4     3    NA    NA    39
15 3         5     3    NA    NA    40
16 4         1     4    NA    NA    46
17 4         2     4    NA    NA    47
18 4         3     4    NA    NA    48
19 4         4     4    NA    NA    49
20 4         5     4    NA    NA    50

but I need v1 and v2 to be filled for the next rounds as well by matching the respective ID.

Please let me know if there is any way to do this in R (or in Python).

Thank you.

CodePudding user response:

list(df1, df2, df3, df4) %>% 
bind_rows(.id = 'grp') %>% 
group_by(id) %>%
fill(v1:v3)    # from tidyr
#fill(4:6)     # alternative syntax: columns 4-6
#fill(-c(1:3)) # alternative syntax: everything except columns 1:3 
#fill(everything()) # alternative syntax: fill NAs in all columns


   grp      id round    v1    v2    v3
   <chr> <int> <dbl> <int> <int> <int>
 1 1         1     1     6    11    16
 2 1         2     1     7    12    17
 3 1         3     1     8    13    18
 4 1         4     1     9    14    19
 5 1         5     1    10    15    20
 6 2         1     2     6    11    26
 7 2         2     2     7    12    27
 8 2         3     2     8    13    28
 9 2         4     2     9    14    29
10 2         5     2    10    15    30
11 3         1     3     6    11    36
12 3         2     3     7    12    37
13 3         3     3     8    13    38
14 3         4     3     9    14    39
15 3         5     3    10    15    40
16 4         1     4     6    11    46
17 4         2     4     7    12    47
18 4         3     4     8    13    48
19 4         4     4     9    14    49
20 4         5     4    10    15    50
  • Related