Home > Net >  R: Merger of data frames produces duplicate columns
R: Merger of data frames produces duplicate columns

Time:06-19

I merged these three data frames. For some reason the column "Survey" duplicates. Can you tell me why, and how to fix this?

I used this method. Do you know any better method? I don't like to use merge because it merges only two data frames at a time, if I've understood correctly.

total_data <- merge(data1, data2, by="ID","Survey",all=TRUE)
new_data <- merge(total_data, data3, by="ID","Survey",all=TRUE)

data1

     ID Survey     X1

1     1      1 0.0903
2     2      1 0.467 
3     3      1 0.470 
4     4      1 0.495 
5     5      1 0.391

data2

     ID Survey    X2

1     6      2 0.365
2     7      2 0.199
3     8      2 0.954

data3

     ID Survey    X3

1     9      3 0.853
2    10      3 0.671
3    11      3 0.843

For data reconstruction↓

data1 <- structure(list(ID = c(1, 2, 3, 4, 5), Survey = c(1, 1, 1, 1, 
1), X1 = c(0.0902787881140183, 0.467047377042095, 0.470110389458694, 
0.495419984673155, 0.391416116942865)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -5L))

data2 <- structure(list(ID = c(6, 7, 8), Survey = c(2, 2, 2), X2 = c(0.364798969002616, 
0.198573761749706, 0.954432093454516)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L))

data3 <- structure(list(ID = c(9, 10, 11), Survey = c(3, 3, 3), X3 = c(0.853265053661332, 
0.670957082877364, 0.843229201077576)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -3L))

CodePudding user response:

You could use reduce {purrr} from the purrr package within the tidyverse

data1 <- structure(list(ID = c(1, 2, 3, 4, 5), 
                        Survey = c(1, 1, 1, 1, 
                                                          1), 
                        X1 = c(0.0902787881140183, 0.467047377042095, 0.470110389458694, 
                                                                     0.495419984673155, 0.391416116942865)), class = c("tbl_df", "tbl", 
                                                                                                                       "data.frame"), 
                   row.names = c(NA, -5L))

data2 <- structure(list(ID = c(6, 7, 8), Survey = c(2, 2, 2), X2 = c(0.364798969002616, 
                                                                     0.198573761749706, 0.954432093454516)), class = c("tbl_df", "tbl", 
                                                                                                                       "data.frame"), row.names = c(NA, -3L))

data3 <- structure(list(ID = c(9, 10, 11), 
                        Survey = c(3, 3, 3), 
                        X3 = c(0.853265053661332, 
                                                                       0.670957082877364, 0.843229201077576)), class = c("tbl_df", "tbl", 
                                                                                                                         "data.frame"), row.names = c(NA, -3L))


dfList <- list(data1, data2, data3)
  
dfNew <- 
reduce(dfList, full_join, by = c("ID","Survey"))

sample

  • Related