Home > Software engineering >  extract keywords using grepl
extract keywords using grepl

Time:07-31

I have a data called 'data' and I want to create a new subset 'data_1' if columns 'apple' or 'orange' contains the keyword 'sweet'.

library(tidyverse) 
data_1 <- data %>%
      grepl ('sweet',apple,) 

it worked the first time but it doesn't work now. Also, I want to apply the condition in either the column apple or orange, is there a way to do so?

CodePudding user response:

We may use filter with if_any (if we want to return rows where the keywork 'sweet' is found in either of the columns.) or use if_all (if we want the rows having 'sweet' in both the columns)

library(dplyr)
data_1 <- data  %>%
           filter(if_any(c(apple, orange), ~ grepl('\\bsweet\\b', .x,
ignore.case = TRUE)))

If we want to keep other 'sweetheart' as well, take out the word boundary (\\b)

data_1 <- data  %>%
           filter(if_any(c(apple, orange), ~ grepl('sweet', .x,
ignore.case = TRUE)))

Or may also use | if there are only two columns

data_1 <- data %>%
             filter(grepl('\\bsweet\\b', apple, ignore.case = TRUE)|
                    grepl('\\bsweet\\b', orange', ignore.case = TRUE))

Or another option is to paste the two columns and apply grepl once

data_1 <- data %>%
          filter(grepl('\\bsweet\\b', paste(apple, orange),
         ignore.case = TRUE))

EDIT:

  1. Added word boundary (\\b) in the pattern so that it wouldn't match sweetheart
  2. By default ignore.case = FALSE in grepl. If we want to match either case, use ignore.case = TRUE
  • Related