I am trying to filter a dataset by a vector (the column of another dataset), but instead of matching the items using %in%, I am looking to return items with a similar pattern to the items in the vector.
By similar, I mean if an item in the vector has 2 words e.g. "Orange juice", I would want to filter the data frame for all items with the word "Orange" i.e. the first word.
Below is an example, which hopefully explains better what I'm looking for! Thank you so much in advance.
# Here is some sample data
Data <- data.frame(
col_1=c("Orange juice", "Orange cake", "Lemon curd", "Lemon pie", "Strawberry", "Lime tree"),
col_2=c("food", "food", "food", "food", "fruit", "tree"))
# I want to filter this data by a vector (taken from another data frame) to return items that are similar to the first word of items in the vector
vector <- "Orange", "Lemon ltd", "Grapefruit", "Peach juice"
# I'm looking for something like this:
Data %>% filter(col_1 %like% vector)
# or something like this:
Data %>% filter(str_detect(col_1, pattern = "first word of items in vector" ))
To get this output:
- col_1 <- "Orange juice", "Orange cake", "Lemon curd", "Lemon pie"
- col_2 <- "food", "food", "food", "food"
CodePudding user response:
Something like this?
library(dplyr, warn.conflicts = FALSE)
library(stringr)
df <- tibble(
food_name = c("Orange juice", "Orange cake", "Lemon curd", "Lemon pie", "Strawberry", "Lime tree"),
food_category = c("food", "food", "food", "food", "fruit", "tree")
)
patterns <- c("Orange", "Lemon ltd", "Grapefruit", "Peach juice")
df %>%
filter(
# First word of `food_name` is 'in' first words of `patterns`
str_extract(food_name, "[^\\s] ") %in% str_extract(patterns, "[^\\s] ")
)
#> # A tibble: 4 × 2
#> food_name food_category
#> <chr> <chr>
#> 1 Orange juice food
#> 2 Orange cake food
#> 3 Lemon curd food
#> 4 Lemon pie food
Created on 2022-10-18 with reprex v2.0.2