Home > Blockchain >  Create new column based on matches in another table
Create new column based on matches in another table

Time:02-16

I have 2 data frames that look something like this (it is a count table). data1 has a column called "Method" that I would like to add into data2. I want to match it based on ONLY Col1, Col2, Col3 (the Count column doesn't need to match). If there is a match, take the Method from data1 and put it in data2. If there is no match, then make the value "Not Determined". I have an example of what I'm looking for in the data frame below called data_final.

data1 <- data.frame("Col1" = c("ABC", "ABC", "EFG", "XYZ"), "Col2" = c("AA", "AA", 
"AA", "BB"), "Col3" = c("Al", "B", "Al", "Al"), "Count" = c(1, 4, 6, 2), "Method" = 
c("Sample", "Dry", "Sample", "Sample"))

data2 <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" = 
c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"), 
"Count" = c(1, 4, 5, 6, 2, 1))

I would like to create a new data frame that looks like this with what I described above:

data_final <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" 
= c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"), 
"Count" = c(1, 4, 5, 6, 2, 1), "Method" = c("Sample", "Dry", "Not Determined", 
"Sample", "Sample", "Not Determined"))

Thank you for your help!

CodePudding user response:

You can use left_join on data2, and remove the Count column in data1 for the left_join. Then replace NA with "Not Determined" to match your output.

library(dplyr)

left_join(data2, 
          data1 %>% select(-Count), 
          by = c("Col1", "Col2", "Col3")) %>%
  mutate(Method = replace_na(Method, "Not Determined"))

  Col1 Col2 Col3 Count         Method
1  ABC   AA   Al     1         Sample
2  ABC   AA    B     4            Dry
3  ABC   CC    C     5 Not Determined
4  EFG   AA   Al     6         Sample
5  XYZ   BB   Al     2         Sample
6  XYZ   CC    C     1 Not Determined

CodePudding user response:

In Base R, you could do a merge:

data3 <- merge( data1, data2, all.y = TRUE )

and then replace the NAs with your string of choice:

data3[ is.na( data3[ 5 ] ), 5 ] <- "Not Determined"

which gives you

> data3
  Col1 Col2 Col3 Count         Method
1  ABC   AA   Al     1         Sample
2  ABC   AA    B     4            Dry
3  ABC   CC    C     5 Not Determined
4  EFG   AA   Al     6         Sample
5  XYZ   BB   Al     2         Sample
6  XYZ   CC    C     1 Not Determined

Attention: If you are on an older version of R (< 4.0), you might be dealing with factors and need to add the additional factor level before with

levels( data3$Method ) <- c( levels( data3$Method ), "Not Determined" )
  • Related