I have dataframe A like this:
Sample1
Salmon
Mouse
Rooster
Monkey
My dataframe B is like below:
Sample1 Sample2
Rooster Bird
Mouse Rodent
Salmon Fish
I would like that in my final dataframe, the sample2 column is assigned by comparison of match between two columns of both files. For this, I have used this command:
final_df$Sample2<- dataframe_B$Sample1[match(dataframe_A$Sample1, dataframe_B$Sample2)]
The command works, but when there is no substitute, like monkey here, NA is returned. How can I modify my code so that the same value(monkey, for example) can be returned if there is no match? My real dataset has thousands of rows. Thanks
In short, my final dataframe looks as below and I don't want NA be shown for Monkey, and I'd like Monkey be there. This is just an example of thousands of rows and I want the same be applied for anything that does not have a match:
Sample1 Sample2
Salmon Fish
Mouse Rodent
Rooster Bird
Monkey NA
CodePudding user response:
I'm not sure what your question is, but does the merge()
work for you?
dataframe_A = data.frame(
stringsAsFactors = FALSE,
Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey")
)
dataframe_B = data.frame(
stringsAsFactors = FALSE,
Sample1 = c("Rooster", "Mouse", "Salmon"),
Sample2 = c("Bird", "Rodent", "Fish")
)
dataframe_C = merge(
dataframe_A,
dataframe_B,
all.x = TRUE
)
dataframe_C$Sample2[is.na(dataframe_C$Sample2)] = dataframe_C$Sample1[is.na(dataframe_C$Sample2)]
dataframe_C
CodePudding user response:
If I understand you correctly, you can just do left_join
like this:
library(dplyr)
df1 %>%
left_join(., df2, by = "Sample1")
Output:
Sample1 Sample2
1 Salmon Fish
2 Mouse Rodent
3 Rooster Bird
4 Monkey <NA>
Data
df1 <- data.frame(Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey"))
df2 <- data.frame(Sample1 = c("Rooster", "Mouse", "Salmon"),
Sample2 = c("Bird", "Rodent", "Fish"))