I'm using a df with the following structure:
A <- c(1:3)
B <- c("Sweet", "Home", "Sweet Home")
df <- data.frame(A,B)
A | B |
---|---|
1 | Sweet |
2 | Home |
3 | Sweet Home |
I want to be able to drop all the rows that contain the word "Sweet", unless they contain the word "Home".
I have been using the following code:df1 <- df[!grepl("Sweet", df$B),]
but this deletes both rows 1 and 3, like so:
A | B |
---|---|
2 | Home |
How can I do this so I can keep the values where Sweet has Home in it, too?
CodePudding user response:
Here are two ways using binary logic. The second is an application of de Morgan's laws.
A <- c(1:3)
B <- c("Sweet", "Home", "Sweet Home")
df <- data.frame(A,B)
i <- grepl("sweet", df$B, ignore.case = TRUE)
j <- grepl("home", df$B, ignore.case = TRUE)
df[!(i & !j), ]
#> A B
#> 2 2 Home
#> 3 3 Sweet Home
df[!i | j, ]
#> A B
#> 2 2 Home
#> 3 3 Sweet Home
Created on 2022-09-13 with reprex v2.0.2
Edit
If you have one word to drop unless a list of words is present, the function below might be what you want. The words
vector must have the unwanted word first, followed by the other, wanted words.
# X is the input data.frame
# col is the column to search for
specialfilter <- function(X, col, words) {
l <- lapply(words, \(w) grepl(w, X[[col]], ignore.case = TRUE))
l[[1]] <- !l[[1]]
i <- Reduce(`|`, l)
X[i, ]
}
specialfilter(df, "B", c("sweet", "home"))
#> A B
#> 2 2 Home
#> 3 3 Sweet Home
Created on 2022-09-13 with reprex v2.0.2
CodePudding user response:
Here is a extension of your code to filter out Sweet
and keeping if row contains Home
:
df[-c(grepl("Sweet", df$B)&!grepl("Home", df$B)), ]
A B
2 2 Home
3 3 Sweet Home
CodePudding user response:
Does this work:
library(dplyr)
library(stringr)
df %>% filter(!str_detect(B, '^Sweet$'))
A B
1 2 Home
2 3 Sweet Home