Match values in two columns within group in R-CodePudding

I have a big data frame in which I would like to flag the rows as follows:

within each group, I want to find whether any col1 value is present in col2 column. If so, mark those lines where same values appear.

this is my dataset with the desired flag values:

this is my dataset with the desired flag values

my try was the following:

df %>%
 group_by(attr) %>%
 mutate(flag=ifelse(any(col1)==any(col2), 1, 0))

but it does not work.

thanks in advance!

CodePudding user response：

Please note: I am not sure if you made a typo in your dataframe for row ABC-LKJ 210-31.

You can use the or | so that the 1 returns for both of the rows where the values match per group_by. You can use the following code:

df <- data.frame(col1 = c("ABC", "FGC", "ZHU", "IIN", "OIL", "ABC", "LKJ", "SNM", "QQA"),
                 col2 = c("DLK", "CBN", "ABC", "ZHU", "HSC", "LJK", "QQA", "KDL", "CBV"),
                 attr = c("100-30", "100-30", "100-30", "21-667", "21-667", "210-31", "201-31", "201-31", "201-31"))

library(dplyr)
df %>%
  group_by(attr) %>%
  mutate(flag = ifelse(col1 %in% col2 | col2 %in% col1, 1, 0))
#> # A tibble: 9 × 4
#> # Groups:   attr [4]
#>   col1  col2  attr    flag
#>   <chr> <chr> <chr>  <dbl>
#> 1 ABC   DLK   100-30     1
#> 2 FGC   CBN   100-30     0
#> 3 ZHU   ABC   100-30     1
#> 4 IIN   ZHU   21-667     0
#> 5 OIL   HSC   21-667     0
#> 6 ABC   LJK   210-31     0
#> 7 LKJ   QQA   201-31     1
#> 8 SNM   KDL   201-31     0
#> 9 QQA   CBV   201-31     1

^{Created on 2022-07-08 by the reprex package (v2.0.1)}

CodePudding user response：

I guess, I misunderstood the OPs question. The following solution only fits, if single letters are to be compared rowwise!

The first two lines include the relevant cases:

df <- tibble(col1 = c("ABC", "FGC"),
             col2 = c("DLK", "CBN"))

You could split col2 to single letters and check if it's present in col1. This could be done with library(stringr) like this:

df %>%
  rowwise() %>%
  mutate(flag = ifelse(any(str_detect(col1, unlist(str_split(col2, ""))) == TRUE), 1, 0))

rowwise groups by rows, then str_split returns a list of substrings that has to be unlisted with unlist. str_detect detect the presence of a pattern match in a string (true/false). Finally you check if there are any TRUEs.