Home > Blockchain >  Add row to dataframe, based on condition (matching ID)
Add row to dataframe, based on condition (matching ID)

Time:08-03

I have two dataframes. I want to add to df1 a row out of df2, based on conditions. The conditions are basically the ID. Imagine you have df1 as so..

    ID   Val1     Val2       
1   1       1        3          
2   2       3        2             
3   3       1        2             
4   4       3        4               
5   5       2        3            

and to that df1 you want to add a specific row out of df2, which looks like this..

    ID      A        B      
1   2       4        9          
2   1       5        7             
3   3       2        6             
4   5       4        9               
5   4       2        8           

So the new dataframe should be automatically matching/merging the df1 and df2, based on ID, and keep in mind, I can't simply use cbind, because the order is different.

    ID   Val1     Val2      A      B     
1   1       1        3      5      7
2   2       3        2      4      9       
3   3       1        2      2      6       
4   4       3        4      4      9          
5   5       2        3      2      8

I have used semi_join, as such

df1 %>% 
  bind_rows(semi_join(df2, df1, by = "ID"))

but I get NA's for df2. The dataframes are connteced by ID, but where the values of A and B should be, there is only NA, without any exception. What could I be doing wrong, is there maybe another better alternative solution to this?

CodePudding user response:

We can use merge which will take care of rows order

merge(df1 , df2 , by = "ID")

  • output
  ID Val1 Val2 A B
1  1    1    3 5 7
2  2    3    2 4 9
3  3    1    2 2 6
4  4    3    4 2 8
5  5    2    3 4 9

CodePudding user response:

Since you're using dplyr for your example, here is a solution using left_join. That is the kind of join in the situation like yours, where you want to add information to df1 from df2 while preserving the format of df1.

When you use joins, there is no need to use bind_rows, and a semi_join is a so-called filtering join that won't do what you're after here.

library(dplyr)
df1 |> 
  left_join(df2)

Output:

Joining, by = "ID"
# A tibble: 5 × 5
     ID  Val1  Val2     A     B
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     3     5     7
2     2     3     2     4     9
3     3     1     2     2     6
4     4     3     4     2     8
5     5     2     3     4     9

Data:

library(readr)

df1 <- read_table("ID   Val1     Val2
1       1        3          
2       3        2             
3       1        2             
4       3        4               
5       2        3")

df2 <- read_table("ID      A        B
2       4        9          
1       5        7             
3       2        6             
5       4        9               
4       2        8")
  • Related