Let's say I have two data frames like so:
test_1 = game_df = read.table(text = "id winner_name
1 Jon
2 Bob
3 Lucas
4 Marcus
5 Toad
6 Donkey", header = T)
test_2 = game_df = read.table(text = "id_1 id_2 loser_name
9 1 Henry
2 2 George
3 3 Bagel
4 4 Cat
5 5 Giraffe
7 6 Monkey", header = T)
What I want to do is to first is to left_join
test_1
on the id = id_=1
match like so:
test_1 %>% left_join(test_2 %>% select(id_1, loser_name), by = c('id' = 'id_1'))
There will be certain NA matches (John, and Donkey)
id winner_name loser_name
1 1 Jon <NA>
2 2 Bob George
3 3 Lucas Bagel
4 4 Marcus Cat
5 5 Toad Giraffe
6 6 Donkey <NA>
And then I'd like to use id_2
as the matching column to fill the NA's, so I currently do something like this:
test_1 %>% left_join(test_2 %>% select(id_1, loser_name), by = c('id' = 'id_1')) %>%
left_join(test_2 %>% select(id_2, loser_name), by = c('id' = 'id_2'))
id winner_name loser_name.x loser_name.y
1 1 Jon <NA> Henry
2 2 Bob George George
3 3 Lucas Bagel Bagel
4 4 Marcus Cat Cat
5 5 Toad Giraffe Giraffe
6 6 Donkey <NA> Monkey
This seems to work, but it generates a bunch of duplicate columns with x
and y
suffixes. In my actual dataset, I have to go through this conditional matching approach through a ton of id
matches, so it will generate a large number of duplicate columns, that I then have to manually deselect and rename.
The issue is that there are hundreds of columns in the actual test_2
data.frame (loser_name, loser_country, loser_elo, loser_record, loser_win_rate), etc., so I'd need to manually specify the name and the columns to coalesce for each one. Furthermore, because I do this sequential id matching with more than one id, I'll have loser_name.x, loser_name.y, loser_name.z, and I don't know in advance how many suffixes there will be for each column.
Is there a simpler approach to this?
CodePudding user response:
We can do a coalesce
at the end
library(dplyr)
test_1 %>%
left_join(test_2 %>% select(id_1, loser_name), by = c('id' = 'id_1')) %>%
left_join(test_2 %>% select(id_2, loser_name), by = c('id' = 'id_2')) %>%
transmute(id, winner_name, loser_name = coalesce(loser_name.x, loser_name.y))
-output
id winner_name loser_name
1 1 Jon Henry
2 2 Bob George
3 3 Lucas Bagel
4 4 Marcus Cat
5 5 Toad Giraffe
6 6 Donkey Monkey
CodePudding user response:
You can try joining test_1 on a molten (=long) version of test_2. Only workt if the order of the molten test_2 resembles the search-order of your id's.. Now you can have id_1, id_2, ..., id_100
library(data.table)
#make them data.tables
setDT(test_1)
setDT(test_2)
#join on molten set
test_1[melt(test_2, id.vars = "loser_name"),
loser_name := i.loser_name,
on = .(id = value)]
# id winner_name loser_name
# 1: 1 Jon Henry
# 2: 2 Bob George
# 3: 3 Lucas Bagel
# 4: 4 Marcus Cat
# 5: 5 Toad Giraffe
# 6: 6 Donkey Monkey