Home > Back-end >  Returning data of the match in matching two data frames in R
Returning data of the match in matching two data frames in R

Time:08-30

I'm having a problem with filling in a column in my data frame.

I've got two data frames. DF.A is my main data frame that already has some other measurements, which results in some Variables showing up multiple times per ID. Then there is DF.B that has measurements for almost every variable for each ID.

DF.A<- data.frame(Id=c("001","001","001","002","002","003","003","003"),
                  Variable=c("0_1","2_1","2_1","0_1","0_2","0_1","0_1","2_1"),
                  Measurement=c(NA,NA,NA,NA,NA,NA,NA,NA))

DF.B<- data.frame(Id=c("001","002","002","003","003"),
                  Variable=c("2_1","0_1","0_2","0_1","2_1"),
                  Measurement=c(1,2,3,4,5))

What I want to have is DF.A but with the measurement filled in if the ID and Variable match like so:

DF.C<- data.frame(Id=c("001","001","001","002","002","003","003","003"),
                  Variable=c("0_1","2_1","2_1","0_1","0_2","0_1","0_1","2_1"),
                  Measurement=c(NA,1,1,2,3,4,4,5))

I was looking into match and paste statements, but then it would just insert the measurement of row 12 in DF.B if row 12 of DF.A had any match in DF.B.

Anyone know how I get the measurement from the match in DF.B to show up at that same match in DF.A?

CodePudding user response:

dplyr option using full_join where you remove the "Measurement" column of DF.A to combine the dataframes:

library(dplyr)
full_join(DF.A %>% select(-Measurement), DF.B)
#> Joining, by = c("Id", "Variable")
#>    Id Variable Measurement
#> 1 001      0_1          NA
#> 2 001      2_1           1
#> 3 001      2_1           1
#> 4 002      0_1           2
#> 5 002      0_2           3
#> 6 003      0_1           4
#> 7 003      0_1           4
#> 8 003      2_1           5

Created on 2022-08-29 with reprex v2.0.2

CodePudding user response:

We could use rows_update

library(dplyr)
DF.A$Measurement <- NA_real_
rows_update(DF.A, DF.B, by = c("Id", "Variable"))
   Id Variable Measurement
1 001      0_1          NA
2 001      2_1           1
3 001      2_1           1
4 002      0_1           2
5 002      0_2           3
6 003      0_1           4
7 003      0_1           4
8 003      2_1           5
  • Related