How to add values from other column if conditional join does not execute?-CodePudding

I have two tables this one is old names

Last Name|First Name|ID
Clay      Cassius    1
Alcindor  Lou        2
Artest    Ron        3
Jordan    Michael    4
Scottie   Pippen     5
Kanter    Enes       6

New Names

Last Name|   First Name|   ID
Ali          Muhammad       1
Abdul Jabbar Kareem         2
World Peace  Metta          3
Jordan       Michael        4
Pippen       Scottie        5
Freedom      Enes Kanter    6

Basically I want to do a join to the first table (old names) where it will show the new last name if there has been a name change otherwise blank

Last Name|First Name|ID|Discrepancies
Clay      Cassius    1  Ali
Alcindor  Lou        2  Abdul Jabbar
Artest    Ron        3  World Peace
Jordan    Michael    4  
Pippen   Scottie     5  
Kanter    Enes       6  Freedom

Note that Michael and Scottie's name did not change so in Discrepancies there is a blank.

CodePudding user response：

You could use

library(dplyr)

df1 %>% 
  left_join(df2, by = "ID", suffix = c("", ".y")) %>% 
  mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>% 
  select(-ends_with(".y"))

to get

# A tibble: 6 x 4
  Last_Name First_Name    ID Discrepancies 
  <chr>     <chr>      <dbl> <chr>         
1 Clay      Cassius        1 "Ali"         
2 Alcindor  Lou            2 "Abdul Jabbar"
3 Artest    Ron            3 "World Peace" 
4 Jordan    Michael        4 ""            
5 Scottie   Pippen         5 "Pippen"      
6 Kanter    Enes           6 "Freedom"

Note:

I named the columns Last_Name and First_Name.
The first data frame contains Scottie Pippen instead of Pippen Scottie.

CodePudding user response：

Another possible solution:

library(tidyverse)

old <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  Last = c("Clay",
           "Alcindor","Artest","Jordan","Scottie","Kanter"),
  `First` = c("Cassius","Lou",
              "Ron","Michael","Pippen","Enes"),
  `ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)

new <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  `Last` = c("Ali",
             "Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
  `First` = c("Muhammad",
              "Kareem","Metta","Michael","Scottie","Enes Kanter"),
  ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)

old %>% 
  bind_rows(new) %>% 
  group_by(ID) %>% 
  summarise(
    discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_), 
    Last = first(Last), First = first(First), .groups = "drop" )

#> # A tibble: 6 × 4
#>      ID discrepancies Last     First  
#>   <int> <chr>         <chr>    <chr>  
#> 1     1 Ali           Clay     Cassius
#> 2     2 Abdul Jabbar  Alcindor Lou    
#> 3     3 World Peace   Artest   Ron    
#> 4     4 <NA>          Jordan   Michael
#> 5     5 Pippen        Scottie  Pippen 
#> 6     6 Freedom       Kanter   Enes

CodePudding user response：

You can simply merge your data, and then filter duplicate occurrences by grouping with aggregate.

dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
  ,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies")  )

dfinal$Discrepancies[ aggregate( cbind(Last.Name,First.Name) ~ ID, 
  rbind(dat1, dat2), function(x) !length(unique(x))>1 )$Last.Name ] <- ""

dfinal
  Last.Name First.Name ID Discrepancies
1      Clay    Cassius  1           Ali
2  Alcindor        Lou  2  Abdul Jabbar
3    Artest        Ron  3   World Peace
4    Jordan    Michael  4
5   Scottie     Pippen  5        Pippen
6    Kanter       Enes  6       Freedom

Data

dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))

dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))