Home > Software design >  how to bind/merge different length col together with
how to bind/merge different length col together with

Time:05-25

I have 3 dfs with different length. I would like to bind thme together by id. there is no row connection among those 3 dfs. I did not have a clue how to do it. Could anyone guide me on this? Thanks.

enter image description here

df1<-structure(list(id = c("A0003", "A0004", "A0005"), Gender = c("F", 
"M", "M")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
"data.frame"))

df2<-structure(list(id = c("A0003", "A0003", "A0003", "A0004", "A0004", 
"A0005", "A0005", "A0005", "A0005", "A0005"), Score = c(96, 97, 
88, 76, 88, 98, 68, 89, 91, 35)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

df3<-structure(list(id = c("A0003", "A0003", "A0004", "A0005", "A0005", 
"A0005"), grade = c("A", "B", "B", "A", "B", "F")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

We may need to create a sequence column in each dataset by id as there are duplicates for 'id' in 'df2' and 'df3'. Keep all the datasets in a list, then loop over the list with map (from purrr), create a row sequence (rowid from data.table), reduce the list elements to a single dataset by joining (full_join) by 'id' and the row id ('rid')

library(dplyr)
library(purrr)
library(data.table)
lst(df1, df2, df3) %>% 
   map(~.x %>% 
          mutate(rid = rowid(id))) %>% 
  reduce(full_join, by = c('id', 'rid')) %>% 
  select(-rid) %>% 
  arrange(id)

-output

# A tibble: 10 × 4
   id    Gender Score grade
   <chr> <chr>  <dbl> <chr>
 1 A0003 F         96 A    
 2 A0003 <NA>      97 B    
 3 A0003 <NA>      88 <NA> 
 4 A0004 M         76 B    
 5 A0004 <NA>      88 <NA> 
 6 A0005 M         98 A    
 7 A0005 <NA>      68 B    
 8 A0005 <NA>      89 F    
 9 A0005 <NA>      91 <NA> 
10 A0005 <NA>      35 <NA> 

CodePudding user response:

Another possible solution, based on dplyr:

library(dplyr)

bind_rows(df1, df2, df3) %>% 
  group_by(id) %>% 
  mutate(Score = lead(Score, 1)) %>% 
  mutate(grade = lead(grade, n_distinct(data.frame(id, Gender, Score)))) %>% 
  arrange(id) %>% 
  filter(!if_all(everything(), is.na)) %>% 
  ungroup

#> # A tibble: 10 × 4
#>    id    Gender Score grade
#>    <chr> <chr>  <dbl> <chr>
#>  1 A0003 F         96 A    
#>  2 A0003 <NA>      97 B    
#>  3 A0003 <NA>      88 <NA> 
#>  4 A0004 M         76 B    
#>  5 A0004 <NA>      88 <NA> 
#>  6 A0005 M         98 A    
#>  7 A0005 <NA>      68 B    
#>  8 A0005 <NA>      89 F    
#>  9 A0005 <NA>      91 <NA> 
#> 10 A0005 <NA>      35 <NA>
  • Related