Home > database >  removing duplicate cols after full_join R
removing duplicate cols after full_join R

Time:08-16

I am trying to full_join multiple df in a function. All is ok but I get a result df with duplicate cols as presented below. Is there a way to fix this within the function?

inputs<-list.files(pattern = "*dtu.tsv$")
inputs<-inputs%>%map(read_tsv)

merge_dtu<-function(input){
  df<-map(input, ~filter(.x, ESid %in% filt_site$ESid))
  df_merge<-df%>%reduce(full_join, by = c("ESid","allele")) # join! 
  write_tsv(df_merge, "out.tsv")
}

merge_dtu(inputs)
This provides a df (output of merge_dtu(inputs)) like

ID value_A.x value_B.x  value_A.y value_B.y 
id     a       b          a         b
id     c       d           c        d

So I wish to drop value_A.y and value_B.y which are the duplicate cols from the full_join.

How can I achieve this? Thank you

CodePudding user response:

library(tidyverse)
df <- tibble(fake_a.x = "a",
             fake_b.x = "b",
             fake_a.y = "a",
             fake_b.y = "b")

df %>%
  select(-ends_with(".y"))
#> # A tibble: 1 × 2
#>   fake_a.x fake_b.x
#>   <chr>    <chr>   
#> 1 a        b

Created on 2022-08-15 by the reprex package (v2.0.1)

CodePudding user response:

For this type of task, you can use variations of tidy-select https://tidyselect.r-lib.org/reference/language.html as such:

df1 <- merge_dtu(inputs)

df1 <- select(df1, !ends_with("y"))

However, I am not sure why you want to join two dataframes, and then drop all the columns that you obtained from one of them. It might be better just to select the first dataframe and not attempt to join it with the rest. Let me know if you have further questions.

  • Related