I have two dataframes. I want to add to df1 a row out of df2, based on conditions. The conditions are basically the ID. Imagine you have df1 as so..
ID Val1 Val2
1 1 1 3
2 2 3 2
3 3 1 2
4 4 3 4
5 5 2 3
and to that df1 you want to add a specific row out of df2, which looks like this..
ID A B
1 2 4 9
2 1 5 7
3 3 2 6
4 5 4 9
5 4 2 8
So the new dataframe should be automatically matching/merging the df1 and df2, based on ID, and keep in mind, I can't simply use cbind, because the order is different.
ID Val1 Val2 A B
1 1 1 3 5 7
2 2 3 2 4 9
3 3 1 2 2 6
4 4 3 4 4 9
5 5 2 3 2 8
I have used semi_join, as such
df1 %>%
bind_rows(semi_join(df2, df1, by = "ID"))
but I get NA's for df2. The dataframes are connteced by ID, but where the values of A and B should be, there is only NA, without any exception. What could I be doing wrong, is there maybe another better alternative solution to this?
CodePudding user response:
We can use merge
which will take care of rows order
merge(df1 , df2 , by = "ID")
- output
ID Val1 Val2 A B
1 1 1 3 5 7
2 2 3 2 4 9
3 3 1 2 2 6
4 4 3 4 2 8
5 5 2 3 4 9
CodePudding user response:
Since you're using dplyr
for your example, here is a solution using left_join
. That is the kind of join in the situation like yours, where you want to add information to df1
from df2
while preserving the format of df1
.
When you use joins, there is no need to use bind_rows
, and a semi_join
is a so-called filtering join that won't do what you're after here.
library(dplyr)
df1 |>
left_join(df2)
Output:
Joining, by = "ID"
# A tibble: 5 × 5
ID Val1 Val2 A B
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 3 5 7
2 2 3 2 4 9
3 3 1 2 2 6
4 4 3 4 2 8
5 5 2 3 4 9
Data:
library(readr)
df1 <- read_table("ID Val1 Val2
1 1 3
2 3 2
3 1 2
4 3 4
5 2 3")
df2 <- read_table("ID A B
2 4 9
1 5 7
3 2 6
5 4 9
4 2 8")