Home > Blockchain >  How do I remove .x.x.x.... and .y.y.y... from headings when I do a full join of multiple dataframes
How do I remove .x.x.x.... and .y.y.y... from headings when I do a full join of multiple dataframes

Time:01-13

file1

Country Name Race A B C
.. .. .. .. .. ..

file2

Country Name Race A B C
.. .. .. .. .. ..

file3

Country Name Race A B C
.. .. .. .. .. ..

file4

Country Name Race A B C
.. .. .. .. .. ..

The above are a some .csv data frames that I assigned to a variable called file.list, and then used lapply. The purpose was to full join every data frame into a single one, as you can see by the code below.

file.list= c(file1.csv, file2.csv, file3.csv, file4.csv)

df.list <- lapply(file.list, read.csv)

data <-df.list %>% reduce(full_join, by=c("Country", "Name", "Race"))

data

Country Name Race A.x B.x C.x A.y B.y C.y A.x.x B.x.x C.x.x A.y.y B.y.y C.y.y
.. .. .. .. .. .. .. .. .. .. .. .. .. .. ..

Although the full join works using the above code, the headers have a .x or .y added behind their names. How can I full join such that the headers retain their original names without the .x.x... and .y.y.. behind? It should be a horizontal join retaining every single column. I find that if I add ,suffix=c("",""), the "A", "B" and "C" columns will merge. I've also tried if (endsWith(colnames(data), ".x")==TRUE){ but not sure how to proceed.

Sample data

df_list <- list(structure(list(Country = c("Rf", "Ew"), Name = c("wef",
"eggw"), Race = c("wed", "qw"), A = 1:2, B = 1:2, C = 1:2), class = "data.frame", row.names = c(NA,
-2L)), structure(list(Country = c("Gw", "Wd"), Name = c("wef",
"eggw"), Race = c("wed", "qw"), A = 3:4, B = 5:6, C = 7:8), row.names = c(NA,
-2L), class = "data.frame"), structure(list(Country = c("Qp",
"Ew"), Name = c("wef", "eggw"), Race = c("wed", "qw"), A = 3:4,
    B = 10:11, C = 7:8), row.names = c(NA, -2L), class = "data.frame"))

Desired output

  Country Name Race  A  B  C  A  B  C  A  B  C
1      Ew eggw   qw  2  2  2 NA NA NA  4 11  8
2      Gw  wef  wed NA NA NA  3  5  7 NA NA NA
3      Qp  wef  wed NA NA NA NA NA NA  3 10  7
4      Rf  wef  wed  1  1  1 NA NA NA NA NA NA
5      Wd eggw   qw NA NA NA  4  6  8 NA NA NA

CodePudding user response:

I don't know why you want the names to be duplicated, but while I don't think we can force dplyr::*_join to do it, we can force base::merge to do it.

purrr::reduce(df_list, merge, by=c("Country","Name","Race"), suffix=c("",""), all=TRUE, check.names=FALSE)
# Warning in merge.data.frame(out, elt, ...) :
#   column names 'A', 'B', 'C' are duplicated in the result
# Warning in merge.data.frame(out, elt, ...) :
#   column names 'A', 'B', 'C', 'A', 'B', 'C' are duplicated in the result
#   Country Name Race  A  B  C  A  B  C  A  B  C
# 1      Ew eggw   qw  2  2  2 NA NA NA  4 11  8
# 2      Gw  wef  wed NA NA NA  3  5  7 NA NA NA
# 3      Qp  wef  wed NA NA NA NA NA NA  3 10  7
# 4      Rf  wef  wed  1  1  1 NA NA NA NA NA NA
# 5      Wd eggw   qw NA NA NA  4  6  8 NA NA NA

The warnings cannot be avoided, though they can be suppressed with suppressWarnings.

  •  Tags:  
  • r
  • Related