I merged these three data frames. For some reason the column "Survey" duplicates. Can you tell me why, and how to fix this?
I used this method. Do you know any better method? I don't like to use merge
because it merges only two data frames at a time, if I've understood correctly.
total_data <- merge(data1, data2, by="ID","Survey",all=TRUE)
new_data <- merge(total_data, data3, by="ID","Survey",all=TRUE)
data1
ID Survey X1
1 1 1 0.0903
2 2 1 0.467
3 3 1 0.470
4 4 1 0.495
5 5 1 0.391
data2
ID Survey X2
1 6 2 0.365
2 7 2 0.199
3 8 2 0.954
data3
ID Survey X3
1 9 3 0.853
2 10 3 0.671
3 11 3 0.843
For data reconstruction↓
data1 <- structure(list(ID = c(1, 2, 3, 4, 5), Survey = c(1, 1, 1, 1,
1), X1 = c(0.0902787881140183, 0.467047377042095, 0.470110389458694,
0.495419984673155, 0.391416116942865)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L))
data2 <- structure(list(ID = c(6, 7, 8), Survey = c(2, 2, 2), X2 = c(0.364798969002616,
0.198573761749706, 0.954432093454516)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))
data3 <- structure(list(ID = c(9, 10, 11), Survey = c(3, 3, 3), X3 = c(0.853265053661332,
0.670957082877364, 0.843229201077576)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))
CodePudding user response:
You could use reduce {purrr} from the purrr package within the tidyverse
data1 <- structure(list(ID = c(1, 2, 3, 4, 5),
Survey = c(1, 1, 1, 1,
1),
X1 = c(0.0902787881140183, 0.467047377042095, 0.470110389458694,
0.495419984673155, 0.391416116942865)), class = c("tbl_df", "tbl",
"data.frame"),
row.names = c(NA, -5L))
data2 <- structure(list(ID = c(6, 7, 8), Survey = c(2, 2, 2), X2 = c(0.364798969002616,
0.198573761749706, 0.954432093454516)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))
data3 <- structure(list(ID = c(9, 10, 11),
Survey = c(3, 3, 3),
X3 = c(0.853265053661332,
0.670957082877364, 0.843229201077576)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -3L))
dfList <- list(data1, data2, data3)
dfNew <-
reduce(dfList, full_join, by = c("ID","Survey"))