Home > Back-end >  select rows that contain a character but does not contain another in R
select rows that contain a character but does not contain another in R

Time:12-23

from the dataframe below

df <- data.frame(col1 = c("ap(pl)e", "or(a)ng%e", "pe%ar", "bl(u%)e", "red"),
                 col2 = c(1,3,5,4,8))
df
       col1 col2
1   ap(pl)e    1
2 or(a)ng%e    3
3     pe%ar    5
4   bl(u%)e    4
5       red    8

I want to filter rows whose values in col1 contains ( but %.

     col1 col2
1 ap(pl)e    1
2   pe%ar    5
3     red    8

So I am using case_when along with gprel. this is going to be part of the dplyr pipes.

#works
df %>%
    mutate(result = case_when((grepl("p", .[[1]]) & !grepl("r", .[[1]])) ~"Yes",
#does not work                                      TRUE~"No"))
df %>%
    mutate(result = case_when((grepl("(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",
                                      TRUE~"No"))

this does not work for % and (. is there any trick to make it work?

CodePudding user response:

We could match the pattern of ( followed by any characters (.*) and % in str_detect, return TRUE/FALSE for the negated cases (negate = TRUE) to filter the rows

library(dplyr)
library(stringr)
df %>% 
  filter(str_detect(col1, "\\(.*%", negate = TRUE))

-output

      col1 col2
1 ap(pl)e    1
2   pe%ar    5
3     red    8

If it needs to be a column

df %>% 
  mutate(result = case_when(str_detect(col1, "\\(.*%", 
     negate = TRUE) ~ "Yes", TRUE ~ "No"))
       col1 col2 result
1   ap(pl)e    1    Yes
2 or(a)ng%e    3     No
3     pe%ar    5    Yes
4   bl(u%)e    4     No
5       red    8    Yes

Or using base R

subset(df, seq_along(col1) %in% grep("\\(.*%", col1, invert = TRUE))
      col1 col2
1 ap(pl)e    1
3   pe%ar    5
5     red    8

CodePudding user response:

If you are wondering why your code did not work then add slashes in front of '('.

df %>%
  mutate(result = case_when((grepl("\\(", .[[1]]) & !grepl("%", .[[1]])) ~"Yes",TRUE~"No"))

Output:

       col1 col2 result
1   ap(pl)e    1    Yes
2 or(a)ng%e    3     No
3     pe%ar    5     No
4   bl(u%)e    4     No
5       red    8     No

CodePudding user response:

You could deploy a regex using grepl.

df[!grepl('\\(.*%', df$col1, perl=TRUE), ]
#      col1 col2
# 1 ap(pl)e    1
# 3   pe%ar    5
# 5     red    8
  • Related