file1
Country | Name | Race | A | B | C |
---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. |
file2
Country | Name | Race | A | B | C |
---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. |
file3
Country | Name | Race | A | B | C |
---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. |
file4
Country | Name | Race | A | B | C |
---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. |
The above are a some .csv data frames that I assigned to a variable called file.list
, and then used lapply
. The purpose was to full join every data frame into a single one, as you can see by the code below.
file.list= c(file1.csv, file2.csv, file3.csv, file4.csv)
df.list <- lapply(file.list, read.csv)
data <-df.list %>% reduce(full_join, by=c("Country", "Name", "Race"))
data
Country | Name | Race | A.x | B.x | C.x | A.y | B.y | C.y | A.x.x | B.x.x | C.x.x | A.y.y | B.y.y | C.y.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
.. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
Although the full join works using the above code, the headers have a .x or .y added behind their names.
How can I full join such that the headers retain their original names without the .x.x... and .y.y.. behind? It should be a horizontal join retaining every single column. I find that if I add
,suffix=c("","")
, the "A", "B" and "C" columns will merge. I've also tried if (endsWith(colnames(data), ".x")==TRUE){
but not sure how to proceed.
Sample data
df_list <- list(structure(list(Country = c("Rf", "Ew"), Name = c("wef",
"eggw"), Race = c("wed", "qw"), A = 1:2, B = 1:2, C = 1:2), class = "data.frame", row.names = c(NA,
-2L)), structure(list(Country = c("Gw", "Wd"), Name = c("wef",
"eggw"), Race = c("wed", "qw"), A = 3:4, B = 5:6, C = 7:8), row.names = c(NA,
-2L), class = "data.frame"), structure(list(Country = c("Qp",
"Ew"), Name = c("wef", "eggw"), Race = c("wed", "qw"), A = 3:4,
B = 10:11, C = 7:8), row.names = c(NA, -2L), class = "data.frame"))
Desired output
Country Name Race A B C A B C A B C
1 Ew eggw qw 2 2 2 NA NA NA 4 11 8
2 Gw wef wed NA NA NA 3 5 7 NA NA NA
3 Qp wef wed NA NA NA NA NA NA 3 10 7
4 Rf wef wed 1 1 1 NA NA NA NA NA NA
5 Wd eggw qw NA NA NA 4 6 8 NA NA NA
CodePudding user response:
I don't know why you want the names to be duplicated, but while I don't think we can force dplyr::*_join
to do it, we can force base::merge
to do it.
purrr::reduce(df_list, merge, by=c("Country","Name","Race"), suffix=c("",""), all=TRUE, check.names=FALSE)
# Warning in merge.data.frame(out, elt, ...) :
# column names 'A', 'B', 'C' are duplicated in the result
# Warning in merge.data.frame(out, elt, ...) :
# column names 'A', 'B', 'C', 'A', 'B', 'C' are duplicated in the result
# Country Name Race A B C A B C A B C
# 1 Ew eggw qw 2 2 2 NA NA NA 4 11 8
# 2 Gw wef wed NA NA NA 3 5 7 NA NA NA
# 3 Qp wef wed NA NA NA NA NA NA 3 10 7
# 4 Rf wef wed 1 1 1 NA NA NA NA NA NA
# 5 Wd eggw qw NA NA NA 4 6 8 NA NA NA
The warnings cannot be avoided, though they can be suppressed with suppressWarnings
.