I have two tables this one is old names
Last Name|First Name|ID
Clay Cassius 1
Alcindor Lou 2
Artest Ron 3
Jordan Michael 4
Scottie Pippen 5
Kanter Enes 6
New Names
Last Name| First Name| ID
Ali Muhammad 1
Abdul Jabbar Kareem 2
World Peace Metta 3
Jordan Michael 4
Pippen Scottie 5
Freedom Enes Kanter 6
Basically I want to do a join to the first table (old names) where it will show the new last name if there has been a name change otherwise blank
Last Name|First Name|ID|Discrepancies
Clay Cassius 1 Ali
Alcindor Lou 2 Abdul Jabbar
Artest Ron 3 World Peace
Jordan Michael 4
Pippen Scottie 5
Kanter Enes 6 Freedom
Note that Michael and Scottie's name did not change so in Discrepancies there is a blank.
CodePudding user response:
You could use
library(dplyr)
df1 %>%
left_join(df2, by = "ID", suffix = c("", ".y")) %>%
mutate(Discrepancies = ifelse(Last_Name.y == Last_Name, "", Last_Name.y)) %>%
select(-ends_with(".y"))
to get
# A tibble: 6 x 4
Last_Name First_Name ID Discrepancies
<chr> <chr> <dbl> <chr>
1 Clay Cassius 1 "Ali"
2 Alcindor Lou 2 "Abdul Jabbar"
3 Artest Ron 3 "World Peace"
4 Jordan Michael 4 ""
5 Scottie Pippen 5 "Pippen"
6 Kanter Enes 6 "Freedom"
Note:
- I named the columns
Last_Name
andFirst_Name
. - The first data frame contains
Scottie Pippen
instead ofPippen Scottie
.
CodePudding user response:
Another possible solution:
library(tidyverse)
old <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Last = c("Clay",
"Alcindor","Artest","Jordan","Scottie","Kanter"),
`First` = c("Cassius","Lou",
"Ron","Michael","Pippen","Enes"),
`ID` = c(1L, 2L, 3L, 4L, 5L, 6L)
)
new <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Last` = c("Ali",
"Abdul Jabbar","World Peace","Jordan","Pippen","Freedom"),
`First` = c("Muhammad",
"Kareem","Metta","Michael","Scottie","Enes Kanter"),
ID = c(1L, 2L, 3L, 4L, 5L, 6L)
)
old %>%
bind_rows(new) %>%
group_by(ID) %>%
summarise(
discrepancies = if_else(n_distinct(Last) > 1, last(Last), NA_character_),
Last = first(Last), First = first(First), .groups = "drop" )
#> # A tibble: 6 × 4
#> ID discrepancies Last First
#> <int> <chr> <chr> <chr>
#> 1 1 Ali Clay Cassius
#> 2 2 Abdul Jabbar Alcindor Lou
#> 3 3 World Peace Artest Ron
#> 4 4 <NA> Jordan Michael
#> 5 5 Pippen Scottie Pippen
#> 6 6 Freedom Kanter Enes
CodePudding user response:
You can simply merge
your data, and then filter duplicate occurrences by grouping with aggregate
.
dfinal <- setNames( merge( dat1, dat2, "ID", suffixes=c(1,2) )[
,c("Last.Name1","First.Name1","ID","Last.Name2")], c(colnames(dat1),"Discrepancies") )
dfinal$Discrepancies[ aggregate( cbind(Last.Name,First.Name) ~ ID,
rbind(dat1, dat2), function(x) !length(unique(x))>1 )$Last.Name ] <- ""
dfinal
Last.Name First.Name ID Discrepancies
1 Clay Cassius 1 Ali
2 Alcindor Lou 2 Abdul Jabbar
3 Artest Ron 3 World Peace
4 Jordan Michael 4
5 Scottie Pippen 5 Pippen
6 Kanter Enes 6 Freedom
Data
dat1 <- structure(list(Last.Name = c("Clay", "Alcindor", "Artest", "Jordan",
"Scottie", "Kanter"), First.Name = c("Cassius", "Lou", "Ron",
"Michael", "Pippen", "Enes"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))
dat2 <- structure(list(Last.Name = c("Ali", "Abdul Jabbar", "World Peace",
"Jordan", "Pippen", "Freedom"), First.Name = c("Muhammad", "Kareem",
"Metta", "Michael", "Scottie", "Enes Kanter"), ID = 1:6), class = "data.frame", row.names = c(NA,
-6L))