Home > database >  Match the column value of a dataframe to another, and if no match, the old value stays as it is
Match the column value of a dataframe to another, and if no match, the old value stays as it is

Time:05-15

I have dataframe A like this:

Sample1 
Salmon    
Mouse    
Rooster   
Monkey

My dataframe B is like below:

    Sample1 Sample2
    Rooster  Bird     
    Mouse    Rodent
    Salmon   Fish

I would like that in my final dataframe, the sample2 column is assigned by comparison of match between two columns of both files. For this, I have used this command:

final_df$Sample2<- dataframe_B$Sample1[match(dataframe_A$Sample1, dataframe_B$Sample2)]

The command works, but when there is no substitute, like monkey here, NA is returned. How can I modify my code so that the same value(monkey, for example) can be returned if there is no match? My real dataset has thousands of rows. Thanks

In short, my final dataframe looks as below and I don't want NA be shown for Monkey, and I'd like Monkey be there. This is just an example of thousands of rows and I want the same be applied for anything that does not have a match:

   Sample1  Sample2
    Salmon    Fish     
    Mouse     Rodent
    Rooster   Bird
    Monkey     NA

CodePudding user response:

I'm not sure what your question is, but does the merge() work for you?

dataframe_A = data.frame(
  stringsAsFactors = FALSE,
           Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey")
)

dataframe_B = data.frame(
  stringsAsFactors = FALSE,
  Sample1 = c("Rooster",  "Mouse", "Salmon"),
  Sample2 = c("Bird", "Rodent", "Fish")
)

dataframe_C = merge(
  dataframe_A, 
  dataframe_B, 
  all.x = TRUE
)
dataframe_C$Sample2[is.na(dataframe_C$Sample2)] = dataframe_C$Sample1[is.na(dataframe_C$Sample2)]

dataframe_C

CodePudding user response:

If I understand you correctly, you can just do left_join like this:

library(dplyr)
df1 %>%
  left_join(., df2, by = "Sample1")

Output:

  Sample1 Sample2
1  Salmon    Fish
2   Mouse  Rodent
3 Rooster    Bird
4  Monkey    <NA>

Data

df1 <- data.frame(Sample1 = c("Salmon", "Mouse", "Rooster", "Monkey"))
df2 <- data.frame(Sample1 = c("Rooster", "Mouse", "Salmon"),
                  Sample2 = c("Bird", "Rodent", "Fish"))
  • Related