I have a big data frame in which I would like to flag the rows as follows:
within each group, I want to find whether any col1 value is present in col2 column. If so, mark those lines where same values appear.
this is my dataset with the desired flag values:
my try was the following:
df %>%
group_by(attr) %>%
mutate(flag=ifelse(any(col1)==any(col2), 1, 0))
but it does not work.
thanks in advance!
CodePudding user response:
Please note: I am not sure if you made a typo in your dataframe for row ABC-LKJ 210-31.
You can use the or |
so that the 1 returns for both of the rows where the values match per group_by
. You can use the following code:
df <- data.frame(col1 = c("ABC", "FGC", "ZHU", "IIN", "OIL", "ABC", "LKJ", "SNM", "QQA"),
col2 = c("DLK", "CBN", "ABC", "ZHU", "HSC", "LJK", "QQA", "KDL", "CBV"),
attr = c("100-30", "100-30", "100-30", "21-667", "21-667", "210-31", "201-31", "201-31", "201-31"))
library(dplyr)
df %>%
group_by(attr) %>%
mutate(flag = ifelse(col1 %in% col2 | col2 %in% col1, 1, 0))
#> # A tibble: 9 × 4
#> # Groups: attr [4]
#> col1 col2 attr flag
#> <chr> <chr> <chr> <dbl>
#> 1 ABC DLK 100-30 1
#> 2 FGC CBN 100-30 0
#> 3 ZHU ABC 100-30 1
#> 4 IIN ZHU 21-667 0
#> 5 OIL HSC 21-667 0
#> 6 ABC LJK 210-31 0
#> 7 LKJ QQA 201-31 1
#> 8 SNM KDL 201-31 0
#> 9 QQA CBV 201-31 1
Created on 2022-07-08 by the reprex package (v2.0.1)
CodePudding user response:
I guess, I misunderstood the OPs question. The following solution only fits, if single letters are to be compared rowwise!
The first two lines include the relevant cases:
df <- tibble(col1 = c("ABC", "FGC"),
col2 = c("DLK", "CBN"))
You could split col2 to single letters and check if it's present in col1. This could be done with library(stringr)
like this:
df %>%
rowwise() %>%
mutate(flag = ifelse(any(str_detect(col1, unlist(str_split(col2, ""))) == TRUE), 1, 0))
rowwise
groups by rows, then str_split
returns a list of substrings that has to be unlisted with unlist
. str_detect
detect the presence of a pattern match in a string (true/false). Finally you check if there are any TRUEs.