I have two data frames as below:
df1:
col1 <- c("Snhg6", "Mt1")
col2 <- c("Lpar6", "Nufip2")
col3 <- c("Lasp1", "Pipox")
col4 <- c("Bcam", "Rhot2")
df1 <- data.frame(col1, col2, col3, col4)
# col1 col2 col3 col4
# 1 Snhg6 Lpar6 Lasp1 Bcam
# 2 Mt1 Nufip2 Pipox Rhot2
df2:
col10 <- c("Bcam", "Rhot2", "Lpar6" , "Snhg6")
df2 <- data.frame(col10)
# col10
# 1 Bcam
# 2 Rhot2
# 3 Lpar6
# 4 Snhg6
I want to merge them in a way to find how many of the items in each column of df1
is matching with the items of the df2
. I would like to have an output as:
output_df:
# col1 col2 col3 col4
# Snhg6 Lpar6 NA Bcam
# NA NA NA Rhot2
I tried to use the methods below but non of them gave the expected results.
library(dplyr)
output_df <- df1 %>% full_join(df2)
and
library(tidyr)
output_df <- crossing(df1,df2)
How can I get the desired output? Thanks.
CodePudding user response:
base
res <- df1
res[array(!unlist(res) %in% df2$col10, dim(res))] <- NA
res
# col1 col2 col3 col4
# 1 Snhg6 Lpar6 <NA> Bcam
# 2 <NA> <NA> <NA> Rhot2
dplyr
library(dplyr)
df1 %>%
mutate(across(everything(), ~ replace(.x, !.x %in% df2$col10, NA)))
# col1 col2 col3 col4
# 1 Snhg6 Lpar6 <NA> Bcam
# 2 <NA> <NA> <NA> Rhot2
CodePudding user response:
df1_mat <- as.matrix(df1)
df1_mat[!df1_mat %in% df2$col10] <- NA
as.data.frame(df1_mat)
#> col1 col2 col3 col4
#> 1 Snhg6 Lpar6 <NA> Bcam
#> 2 <NA> <NA> <NA> Rhot2
CodePudding user response:
Another option is to use an ifelse
statement with str_detect
wrap in across
:
library(dplyr)
library(stringr)
pattern<- paste(df2$col10, collapse = "|")
df1 %>%
mutate(across(everything(), ~ifelse(str_detect(., pattern), ., NA_character_)))
col1 col2 col3 col4
1 Snhg6 Lpar6 <NA> Bcam
2 <NA> <NA> <NA> Rhot2