I have a problem in R which is similar to this one:
merge pandas dataframe with key duplicates
It shouldn't be that hard to do this in R, but I just don't find the solution.
Thank you very much for your help!
CodePudding user response:
Create a sequence column in each data by 'key' and then do a full_join
library(dplyr)
library(data.table)
df1 %>%
mutate(rn = rowid(key)) %>%
full_join(df2 %>%
mutate(rn = rowid(key))) %>%
select(-rn)
-output
key A B
1 K0 A0 B0
2 K1 A1 B1
3 K2 A2 B2
4 K2 A3 B3
5 K2 A4 <NA>
6 K3 A5 B4
7 K3 <NA> B5
8 K4 <NA> B6
data
df1 <- structure(list(key = c("K0", "K1", "K2", "K2", "K2", "K3"), A = c("A0",
"A1", "A2", "A3", "A4", "A5")), class = "data.frame",
row.names = c("0",
"1", "2", "3", "4", "5"))
df2 <- structure(list(key = c("K0", "K1", "K2", "K2", "K3", "K3", "K4"
), B = c("B0", "B1", "B2", "B3", "B4", "B5", "B6")),
class = "data.frame", row.names = c("0",
"1", "2", "3", "4", "5", "6"))
CodePudding user response:
The answer of akrun is fantastic (see also comments): And I learned again some new stuff:
Most of all using rowid{data.table}
which is a convenience function for generating a unique row ids within each group.
The dplyr
only solution would need two steps for this:
library(dplyr)
df1 %>%
group_by(key) %>%
mutate(id = row_number()) %>%
full_join(df2 %>%
group_by(key) %>%
mutate(id=row_number())) %>%
select(-id)
key A B
<chr> <chr> <chr>
1 K0 A0 B0
2 K1 A1 B1
3 K2 A2 B2
4 K2 A3 B3
5 K2 A4 NA
6 K3 A5 B4
7 K3 NA B5
8 K4 NA B6