I'm using a df with the following structure:

A <- c(1:3)
B <- c("Sweet", "Home", "Sweet Home")
df <- data.frame(A,B)

A	B
1	Sweet
2	Home
3	Sweet Home

I want to be able to drop all the rows that contain the word "Sweet", unless they contain the word "Home". I have been using the following code:df1 <- df[!grepl("Sweet", df$B),] but this deletes both rows 1 and 3, like so:

A	B
2	Home

How can I do this so I can keep the values where Sweet has Home in it, too?

CodePudding user response：

Here are two ways using binary logic. The second is an application of de Morgan's laws.

A <- c(1:3)
B <- c("Sweet", "Home", "Sweet Home")
df <- data.frame(A,B)

i <- grepl("sweet", df$B, ignore.case = TRUE)
j <- grepl("home", df$B, ignore.case = TRUE)

df[!(i & !j), ]
#>   A          B
#> 2 2       Home
#> 3 3 Sweet Home

df[!i | j, ]
#>   A          B
#> 2 2       Home
#> 3 3 Sweet Home

^{Created on 2022-09-13 with reprex v2.0.2}

Edit

If you have one word to drop unless a list of words is present, the function below might be what you want. The words vector must have the unwanted word first, followed by the other, wanted words.

# X is the input data.frame 
# col is the column to search for
specialfilter <- function(X, col, words) {
  l <- lapply(words, \(w) grepl(w, X[[col]], ignore.case = TRUE))
  l[[1]] <- !l[[1]]
  i <- Reduce(`|`, l)
  X[i, ]
}

specialfilter(df, "B", c("sweet", "home"))
#>   A          B
#> 2 2       Home
#> 3 3 Sweet Home

^{Created on 2022-09-13 with reprex v2.0.2}

CodePudding user response：

Here is a extension of your code to filter out Sweet and keeping if row contains Home:

df[-c(grepl("Sweet", df$B)&!grepl("Home", df$B)), ]

  A          B
2 2       Home
3 3 Sweet Home

CodePudding user response：

Does this work:

library(dplyr)
library(stringr)
df %>% filter(!str_detect(B, '^Sweet$'))
  A          B
1 2       Home
2 3 Sweet Home