Home > Blockchain >  How to check matching between dataframes?
How to check matching between dataframes?

Time:08-09

I have df:

A <- c("010410", "010420","010511","010519","010519","010591","010591",NA,NA)
B <- c("010392","010410","010420","010511","010512","010519","010592","010592","010593")
C <- c("010410", "010420", "010511", "010512", "010519", "010594", "010594", "010599",NA)
D <- c("010231", "010239", "010290", "010290", "010310", "010391", "010392", "010410", "010420")

df <- data.frame(A,B,C,D)

Now I want to find out in which columns match the following codes and return the dataframe where will be YES and NOW across A B C & D columns

df2 <- as.data.frame(c("010410","010420","010511","010512","010519","010290"))
  • YES - if it matches
  • NO- if such code does not exist in this column

Expected outcome:

         A      B       C       D
010410  YES    YES     YES     YES 
010420  YES    YES     YES     YES
010511  YES    YES     YES     NO
010512  NO     YES     YES     NO
010519  YES    YES     YES     NO
010290  NO      NO      NO     YES

CodePudding user response:

Here's a simple way using TRUE and FALSE:

lapply(df, \(x) df2[[1]] %in% x) |>
  as.data.frame() |>
  `row.names<-`(df2[[1]])
#            A     B     C     D
# 010410  TRUE  TRUE  TRUE  TRUE
# 010420  TRUE  TRUE  TRUE  TRUE
# 010511  TRUE  TRUE  TRUE FALSE
# 010512 FALSE  TRUE  TRUE FALSE
# 010519  TRUE  TRUE  TRUE FALSE
# 010290 FALSE FALSE FALSE  TRUE

If the Yes/Nos are strongly preferred, we can get them with this modification:

lapply(df, \(x) c("No", "Yes")[(df2[[1]] %in% x)   1]) |>
  as.data.frame() |>
  `row.names<-`(df2[[1]])
#          A   B   C   D
# 010410 Yes Yes Yes Yes
# 010420 Yes Yes Yes Yes
# 010511 Yes Yes Yes  No
# 010512  No Yes Yes  No
# 010519 Yes Yes Yes  No
# 010290  No  No  No Yes
  • Related