Home > Net >  R function that detects if a dataframe column contains string values from another dataframe column a
R function that detects if a dataframe column contains string values from another dataframe column a

Time:05-20

I have two dataframes:

df1:

name
Apple page
Mango page
Lychee juice
Cranberry club

df2:

fruit
Apple
Grapes
Strawberry
Mango
lychee
cranberry

If df1$name contains a value in df2$fruit (non case-sensitive), I want to add a column to df1 that has the value from df2$fruit that df1$name contains. df1 would then look like this:

name category
Apple page Apple
Mango page Mango
Lychee juice lychee
Cranberry club cranberry

CodePudding user response:

This should work:

library(stringr)
df1$category = str_extract(
  df1$name, 
  pattern = regex(paste(df2$fruit, collapse = "|"), ignore_case = TRUE)
)

df1
#             name  category
# 1     Apple page     Apple
# 2     Mango page     Mango
# 3   Lychee juice    Lychee
# 4 Cranberry club Cranberry

Using this data:

df1 = read.table(text = 'name
Apple page
Mango page
Lychee juice
Cranberry club', header = T, sep = ";")

df2 = read.table(text = 'fruit
Apple
Grapes
Strawberry
Mango
lychee
cranberry', header = T, sep = ";")

CodePudding user response:

First you could a column for each of the possible categories to the dataframe with the names, as placeholders (just filled with NA). Then for each of those columns, check whether the column name (so the category) appears in the name. Turn it into a long dataframe, and then remove the FALSE rows -- those that didn't detect the category in the name.

library(tidyverse)

df1 <- tribble(
  ~name,
  "Apple page",
  "Mango page",
  "Lychee juice",
  "Cranberry club"
)
df2 <- tribble(
  ~fruit,
  "Apple",
  "Grapes",
  "Strawberry",
  "Mango",
  "lychee",
  "cranberry"
)

fruits <- df2$fruit %>%
  str_to_lower() %>% 
  set_names(rep(NA_character_, length(.)), .)

df1 %>% 
  add_column(!!!fruits) %>% 
  mutate(across(-name, ~str_detect(str_to_lower(name), cur_column()))) %>% 
  pivot_longer(-name, names_to = "category") %>% 
  filter(value) %>% 
  select(-value)

#> # A tibble: 4 × 2
#>   name           category 
#>   <chr>          <chr>    
#> 1 Apple page     apple    
#> 2 Mango page     mango    
#> 3 Lychee juice   lychee   
#> 4 Cranberry club cranberry
  • Related