Home > Enterprise >  How to set column value based on string (mis)match between two other columns?
How to set column value based on string (mis)match between two other columns?

Time:05-23

I want to create a match variable in a dataframe that

  • is 1 if the value of another variable (string) is contained in the value of a third variable (string)
  • is 0 if that is not the case
  • and is NA if either of the string variables is NA

So far I have tried (str_contains function from the sjmisc package):

df$match[(df$str1 == "left"  & str_contains(df$str2, "left"))
                  | (df$str1== "right"  & str_contains(df$str2, "right"))] = 1

df$match[(df$str1== "left"  & str_contains(df$str2, "left", logic = "not")) 
                  | (df$str1== "right"  & str_contains(df$str2, "right", logic = "not"))] = 0

df$match[is.na(df$str1)| is.na(df$str2)] = NA

But only the NA part works well, for the rest I get all rows = 1 which isn't right based on the data.

Data example:

str1 str2 match
left right -
right somewhat left -
left very left -
right right -
right somewhat right -

match should be 0,0,1,1,1 in the example, but ends up all 1 instead. I'd be grateful for any suggestions what's wrong here or alternative ways to achieve the result I want!

CodePudding user response:

library(tidyverse)

data <- tribble(
  ~str1, ~str2, ~match,
  "left", "right", "-",
  "right", "somewhat left", "-",
  "left", "very left", "-",
  "right", "right", "-",
  "right", "somewhat right", "-",
  NA, NA, "-"
)

data %>%
  mutate(
    match = ifelse(str_detect(str2, str1), 1, 0)
  )
#> # A tibble: 6 × 3
#>   str1  str2           match
#>   <chr> <chr>          <dbl>
#> 1 left  right              0
#> 2 right somewhat left      0
#> 3 left  very left          1
#> 4 right right              1
#> 5 right somewhat right     1
#> 6 <NA>  <NA>              NA

Created on 2022-05-23 by the reprex package (v2.0.0)

CodePudding user response:

A base solution:

within(df, {
  match <-  mapply(grepl, str1, str2)
})

#    str1           str2 match
# 1  left          right     0
# 2 right  somewhat left     0
# 3  left      very left     1
# 4 right          right     1
# 5 right somewhat right     1
# 6  <NA>           <NA>    NA

Data
df <- structure(list(str1 = c("left", "right", "left", "right", "right", 
NA), str2 = c("right", "somewhat left", "very left", "right", 
"somewhat right", NA)), row.names = c(NA, -6L), class = "data.frame")
  • Related