I have a data called 'data' and I want to create a new subset 'data_1' if columns 'apple' or 'orange' contains the keyword 'sweet'.
library(tidyverse)
data_1 <- data %>%
grepl ('sweet',apple,)
it worked the first time but it doesn't work now. Also, I want to apply the condition in either the column apple or orange, is there a way to do so?
CodePudding user response:
We may use filter
with if_any
(if we want to return rows where the keywork 'sweet' is found in either of the columns.) or use if_all
(if we want the rows having 'sweet' in both the columns)
library(dplyr)
data_1 <- data %>%
filter(if_any(c(apple, orange), ~ grepl('\\bsweet\\b', .x,
ignore.case = TRUE)))
If we want to keep other 'sweetheart' as well, take out the word boundary (\\b
)
data_1 <- data %>%
filter(if_any(c(apple, orange), ~ grepl('sweet', .x,
ignore.case = TRUE)))
Or may also use |
if there are only two columns
data_1 <- data %>%
filter(grepl('\\bsweet\\b', apple, ignore.case = TRUE)|
grepl('\\bsweet\\b', orange', ignore.case = TRUE))
Or another option is to paste
the two columns and apply grepl
once
data_1 <- data %>%
filter(grepl('\\bsweet\\b', paste(apple, orange),
ignore.case = TRUE))
EDIT:
- Added word boundary (
\\b
) in the pattern so that it wouldn't matchsweetheart
- By default
ignore.case = FALSE
ingrepl
. If we want to match either case, useignore.case = TRUE