I have two dataframes:
df1:
name |
---|
Apple page |
Mango page |
Lychee juice |
Cranberry club |
df2:
fruit |
---|
Apple |
Grapes |
Strawberry |
Mango |
lychee |
cranberry |
If df1$name contains a value in df2$fruit (non case-sensitive), I want to add a column to df1 that has the value from df2$fruit that df1$name contains. df1 would then look like this:
name | category |
---|---|
Apple page | Apple |
Mango page | Mango |
Lychee juice | lychee |
Cranberry club | cranberry |
CodePudding user response:
This should work:
library(stringr)
df1$category = str_extract(
df1$name,
pattern = regex(paste(df2$fruit, collapse = "|"), ignore_case = TRUE)
)
df1
# name category
# 1 Apple page Apple
# 2 Mango page Mango
# 3 Lychee juice Lychee
# 4 Cranberry club Cranberry
Using this data:
df1 = read.table(text = 'name
Apple page
Mango page
Lychee juice
Cranberry club', header = T, sep = ";")
df2 = read.table(text = 'fruit
Apple
Grapes
Strawberry
Mango
lychee
cranberry', header = T, sep = ";")
CodePudding user response:
First you could a column for each of the possible categories to the dataframe with the names, as placeholders (just filled with NA). Then for each of those columns, check whether the column name (so the category) appears in the name. Turn it into a long dataframe, and then remove the FALSE
rows -- those that didn't detect the category in the name.
library(tidyverse)
df1 <- tribble(
~name,
"Apple page",
"Mango page",
"Lychee juice",
"Cranberry club"
)
df2 <- tribble(
~fruit,
"Apple",
"Grapes",
"Strawberry",
"Mango",
"lychee",
"cranberry"
)
fruits <- df2$fruit %>%
str_to_lower() %>%
set_names(rep(NA_character_, length(.)), .)
df1 %>%
add_column(!!!fruits) %>%
mutate(across(-name, ~str_detect(str_to_lower(name), cur_column()))) %>%
pivot_longer(-name, names_to = "category") %>%
filter(value) %>%
select(-value)
#> # A tibble: 4 × 2
#> name category
#> <chr> <chr>
#> 1 Apple page apple
#> 2 Mango page mango
#> 3 Lychee juice lychee
#> 4 Cranberry club cranberry