I want to create a match variable in a dataframe that
- is 1 if the value of another variable (string) is contained in the value of a third variable (string)
- is 0 if that is not the case
- and is NA if either of the string variables is NA
So far I have tried (str_contains function from the sjmisc package):
df$match[(df$str1 == "left" & str_contains(df$str2, "left"))
| (df$str1== "right" & str_contains(df$str2, "right"))] = 1
df$match[(df$str1== "left" & str_contains(df$str2, "left", logic = "not"))
| (df$str1== "right" & str_contains(df$str2, "right", logic = "not"))] = 0
df$match[is.na(df$str1)| is.na(df$str2)] = NA
But only the NA part works well, for the rest I get all rows = 1 which isn't right based on the data.
Data example:
str1 | str2 | match |
---|---|---|
left | right | - |
right | somewhat left | - |
left | very left | - |
right | right | - |
right | somewhat right | - |
match should be 0,0,1,1,1 in the example, but ends up all 1 instead. I'd be grateful for any suggestions what's wrong here or alternative ways to achieve the result I want!
CodePudding user response:
library(tidyverse)
data <- tribble(
~str1, ~str2, ~match,
"left", "right", "-",
"right", "somewhat left", "-",
"left", "very left", "-",
"right", "right", "-",
"right", "somewhat right", "-",
NA, NA, "-"
)
data %>%
mutate(
match = ifelse(str_detect(str2, str1), 1, 0)
)
#> # A tibble: 6 × 3
#> str1 str2 match
#> <chr> <chr> <dbl>
#> 1 left right 0
#> 2 right somewhat left 0
#> 3 left very left 1
#> 4 right right 1
#> 5 right somewhat right 1
#> 6 <NA> <NA> NA
Created on 2022-05-23 by the reprex package (v2.0.0)
CodePudding user response:
A base
solution:
within(df, {
match <- mapply(grepl, str1, str2)
})
# str1 str2 match
# 1 left right 0
# 2 right somewhat left 0
# 3 left very left 1
# 4 right right 1
# 5 right somewhat right 1
# 6 <NA> <NA> NA
Data
df <- structure(list(str1 = c("left", "right", "left", "right", "right",
NA), str2 = c("right", "somewhat left", "very left", "right",
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")