from the dataframe below
df <- data.frame(col1 = c("ap(pl)e", "or(a)ng%e", "pe%ar", "bl(u%)e", "red"),
col2 = c(1,3,5,4,8))
df
col1 col2
1 ap(pl)e 1
2 or(a)ng%e 3
3 pe%ar 5
4 bl(u%)e 4
5 red 8
I want to filter rows whose values in col1 contains ( but %.
col1 col2
1 ap(pl)e 1
2 pe%ar 5
3 red 8
So I am using case_when along with gprel. this is going to be part of the dplyr pipes.
#works
df %>%
mutate(result = case_when((grepl("p", .[[1]]) & !grepl("r", .[[1]])) ~"Yes",
#does not work TRUE~"No"))
df %>%
mutate(result = case_when((grepl("(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",
TRUE~"No"))
this does not work for % and (. is there any trick to make it work?
CodePudding user response:
We could match the pattern of (
followed by any characters (.*
) and %
in str_detect
, return TRUE/FALSE for the negated cases (negate = TRUE
) to filter
the rows
library(dplyr)
library(stringr)
df %>%
filter(str_detect(col1, "\\(.*%", negate = TRUE))
-output
col1 col2
1 ap(pl)e 1
2 pe%ar 5
3 red 8
If it needs to be a column
df %>%
mutate(result = case_when(str_detect(col1, "\\(.*%",
negate = TRUE) ~ "Yes", TRUE ~ "No"))
col1 col2 result
1 ap(pl)e 1 Yes
2 or(a)ng%e 3 No
3 pe%ar 5 Yes
4 bl(u%)e 4 No
5 red 8 Yes
Or using base R
subset(df, seq_along(col1) %in% grep("\\(.*%", col1, invert = TRUE))
col1 col2
1 ap(pl)e 1
3 pe%ar 5
5 red 8
CodePudding user response:
If you are wondering why your code did not work then add slashes in front of '('.
df %>%
mutate(result = case_when((grepl("\\(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",TRUE~"No"))
Output:
col1 col2 result
1 ap(pl)e 1 Yes
2 or(a)ng%e 3 No
3 pe%ar 5 No
4 bl(u%)e 4 No
5 red 8 No
CodePudding user response:
You could deploy a regex using grepl
.
df[!grepl('\\(.*%', df$col1, perl=TRUE), ]
# col1 col2
# 1 ap(pl)e 1
# 3 pe%ar 5
# 5 red 8