Home > Software engineering >  Merging two datasets rows based on a single column variable
Merging two datasets rows based on a single column variable

Time:03-04

I have a two dataframes in R containing the same type of data (same variables). For this example I named them blank and data.

>blank    
    Prey  Sample Value
1   act1      IP    0
2   act1    Mock    0
3   act2      IP    0
4   act2    Mock    0
5   act3      IP    0


>data
    Prey  Sample Value
1   act1      IP  2.2
2   act1    Mock    0
3   act3      IP  3.6
4   act3      IP  3.1
5   act3      IP  2.1

I'd like to combine the two dataframes into one. I want to keep all the rows from data intact while adding all the rows from blank that are missing in data according ONLY to Prey column. To be clear, I don't need union, since I don't want to consider the entire row contents, but just the content of the Prey column to merge the two datasets. Below is the expected result for this small example, where rows with act2 from blank were added while the rows from data were left intact.

>result    
    Prey  Sample Value
1   act1      IP  2.2
2   act1    Mock    0
3   act3      IP  3.6
4   act3      IP  3.1
5   act3      IP  2.1
6   act2      IP    0
4   act2    Mock    0

I looked around for a while for an easy solution but I am still stuck. Appreciate any suggestion.

CodePudding user response:

rbind(
  data,
  blank[!blank$Prey %in% data$Prey,]
)

CodePudding user response:

Here is a dyplr solution using anti_join and bind_rows

library(dplyr)
library(tibble)

blank %>% 
  anti_join(data, by="Prey") %>% 
  bind_rows(data, .) %>% 
  as_tibble()
  Prey  Sample Value
  <chr> <chr>  <dbl>
1 act1  IP       2.2
2 act1  Mock     0  
3 act3  IP       3.6
4 act3  IP       3.1
5 act3  IP       2.1
6 act2  IP       0  
7 act2  Mock     0  
  • Related