I have 2 data frames that look something like this (it is a count table). data1 has a column called "Method" that I would like to add into data2. I want to match it based on ONLY Col1, Col2, Col3 (the Count column doesn't need to match). If there is a match, take the Method from data1 and put it in data2. If there is no match, then make the value "Not Determined". I have an example of what I'm looking for in the data frame below called data_final.
data1 <- data.frame("Col1" = c("ABC", "ABC", "EFG", "XYZ"), "Col2" = c("AA", "AA",
"AA", "BB"), "Col3" = c("Al", "B", "Al", "Al"), "Count" = c(1, 4, 6, 2), "Method" =
c("Sample", "Dry", "Sample", "Sample"))
data2 <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2" =
c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Count" = c(1, 4, 5, 6, 2, 1))
I would like to create a new data frame that looks like this with what I described above:
data_final <- data.frame("Col1" = c("ABC", "ABC", "ABC", "EFG", "XYZ", "XYZ"), "Col2"
= c("AA", "AA","CC", "AA", "BB", "CC"), "Col3" = c("Al", "B", "C", "Al", "Al", "C"),
"Count" = c(1, 4, 5, 6, 2, 1), "Method" = c("Sample", "Dry", "Not Determined",
"Sample", "Sample", "Not Determined"))
Thank you for your help!
CodePudding user response:
You can use left_join
on data2, and remove the Count
column in data1 for the left_join
. Then replace NA
with "Not Determined" to match your output.
library(dplyr)
left_join(data2,
data1 %>% select(-Count),
by = c("Col1", "Col2", "Col3")) %>%
mutate(Method = replace_na(Method, "Not Determined"))
Col1 Col2 Col3 Count Method
1 ABC AA Al 1 Sample
2 ABC AA B 4 Dry
3 ABC CC C 5 Not Determined
4 EFG AA Al 6 Sample
5 XYZ BB Al 2 Sample
6 XYZ CC C 1 Not Determined
CodePudding user response:
In Base R, you could do a merge:
data3 <- merge( data1, data2, all.y = TRUE )
and then replace the NAs with your string of choice:
data3[ is.na( data3[ 5 ] ), 5 ] <- "Not Determined"
which gives you
> data3
Col1 Col2 Col3 Count Method
1 ABC AA Al 1 Sample
2 ABC AA B 4 Dry
3 ABC CC C 5 Not Determined
4 EFG AA Al 6 Sample
5 XYZ BB Al 2 Sample
6 XYZ CC C 1 Not Determined
Attention: If you are on an older version of R (< 4.0), you might be dealing with factors and need to add the additional factor level before with
levels( data3$Method ) <- c( levels( data3$Method ), "Not Determined" )