Home > front end >  How do I add an .x and .y suffix to all duplicate variables after the join?
How do I add an .x and .y suffix to all duplicate variables after the join?

Time:01-11

I was using dplyr::left_join() initially, but it didn't have the match = "first" argument that I wanted that was available in plyr::join. The problem now is that my remaining code has .x and .y for duplicate variables so I can't just use janitor::clean_names().

To give you a sense of what I'm talking about:

id <- c(1, 2, 3, 4, 5)
name1 <- c("a", "b", "c", "d", "e")
name2 <- c("k", "l", "m", "n", "o")
name3 <- c("u", "v", "w", "x", "y")
name4 <- c("z", "a", "b", "c", "d")

df <- data.frame(id, name1, name2, name3, name4)

id <- c(1, 2, 3, 4, 5)
name1 <- c("f", "i", "j", "k", "l")
name2 <- c("p", "q", "r", "s", "t")
name3 <- c("z", "a", "b", "c", "d")
name5 <- c("z", "a", "b", "c", "d")
df2 <- data.frame(id, name1, name2, name3, name5)

library(plyr)

df_join <- plyr::join(df, df2, by=c("id"), type="left", match="first")

I want to turn the duplicate variables (name1, name2, name3) into the following:

Right side: name1.x, name2.x, name3.x Left side: name1.y, name2.y, name3.y

How would I go about doing this? janitor::clean_names() seems to only add "_2" for the left hand side variables.

CodePudding user response:

You can still use dplyr::left_join. To mimic the behavior of plyr::join, just turn duplicated ids in your right dataframe into NAs or any other values that will not be found in the id column of your left dataframe. For instance,

library(dplyr)

df2 %>% 
  mutate(id = replace(id, duplicated(id), NA_integer_)) %>% 
  left_join(df, ., by = "id", suffix = c(".y", ".x"))

Output

  id name1.y name2.y name3.y name4 name1.x name2.x name3.x name5
1  1       a       k       u     z       f       p       z     z
2  2       b       l       v     a       i       q       a     a
3  3       c       m       w     b       j       r       b     b
4  4       d       n       x     c       k       s       c     c
5  5       e       o       y     d       l       t       d     d

The two dataframes are

> df
  id name1 name2 name3 name4
1  1     a     k     u     z
2  2     b     l     v     a
3  3     c     m     w     b
4  4     d     n     x     c
5  5     e     o     y     d
> df2
  id name1 name2 name3 name5
1  1     f     p     z     z
2  2     i     q     a     a
3  3     j     r     b     b
4  4     k     s     c     c
5  5     l     t     d     d
6  1   XXX   XXX   XXX   XXX

CodePudding user response:

library(dplyr)

id <- c(1, 2, 3, 4, 5)
name1 <- c("a", "b", "c", "d", "e")
name2 <- c("k", "l", "m", "n", "o")
name3 <- c("u", "v", "w", "x", "y")
name4 <- c("z", "a", "b", "c", "d")

df <- data.frame(id, name1, name2, name3, name4)

id <- c(1, 1, 3, 3, 5)
name1 <- c("f", "i", "j", "k", "l")
name2 <- c("p", "q", "r", "s", "t")
name3 <- c("z", "a", "b", "c", "d")
name5 <- c("z", "a", "b", "c", "d")
df2 <- data.frame(id, name1, name2, name3, name5)

#not desired result
df %>% left_join(df2, by = 'id')
#>   id name1.x name2.x name3.x name4 name1.y name2.y name3.y name5
#> 1  1       a       k       u     z       f       p       z     z
#> 2  1       a       k       u     z       i       q       a     a
#> 3  2       b       l       v     a    <NA>    <NA>    <NA>  <NA>
#> 4  3       c       m       w     b       j       r       b     b
#> 5  3       c       m       w     b       k       s       c     c
#> 6  4       d       n       x     c    <NA>    <NA>    <NA>  <NA>
#> 7  5       e       o       y     d       l       t       d     d

df_join <- plyr::join(df, df2, by=c("id"), type="left", match="first")
df_join 
#>   id name1 name2 name3 name4 name1 name2 name3 name5
#> 1  1     a     k     u     z     f     p     z     z
#> 2  2     b     l     v     a  <NA>  <NA>  <NA>  <NA>
#> 3  3     c     m     w     b     j     r     b     b
#> 4  4     d     n     x     c  <NA>  <NA>  <NA>  <NA>
#> 5  5     e     o     y     d     l     t     d     d


df %>% left_join(df2, by = 'id') %>% group_by(id) %>% slice_head() %>% as.data.frame()
#>   id name1.x name2.x name3.x name4 name1.y name2.y name3.y name5
#> 1  1       a       k       u     z       f       p       z     z
#> 2  2       b       l       v     a    <NA>    <NA>    <NA>  <NA>
#> 3  3       c       m       w     b       j       r       b     b
#> 4  4       d       n       x     c    <NA>    <NA>    <NA>  <NA>
#> 5  5       e       o       y     d       l       t       d     d

Created on 2022-01-11 by the reprex package (v2.0.1)

  •  Tags:  
  • Related