Dynamic variables from dataframe value in R with value names?-CodePudding

Given a dataframe of types and values like so:

topic	keyword
cheese	cheddar
meat	beef
meat	chicken
cheese	swiss
bread	focaccia
bread	sourdough
cheese	gouda

My aim is to make a set of dynamic regexs based on the type, but I don't know how to make the variable names from the types. I can do this individually like so:

fn_get_topic_regex <- function(targettopic,df)
{
  filter_df <- df |>
    filter(topic == targettopic)
  regex <- paste(filter_df$keyword, collapse =  "|")
}

and do things like:

cheese_regex <- fn_get_topic_regex("cheese",df)

But what I'd like to be able to do is build all these regexes automatically without having to define each one.

The intended output would be something like:

cheese_regex: "cheddar|swiss|gouda"
bread_regex: "focaccia|sourdough"
meat_regex: "beef|chicken"

Where the start of the variable name is the distinct topic.

What's the best way to do that without defining each regex individually by hand?

CodePudding user response：

Here is a base R solution with your intended output in a named list.

df <- structure(list(topic = c("cheese", "meat", "meat", "cheese", "bread", "bread", "cheese"), 
                     keyword = c("cheddar", "beef", "chicken", "swiss", "focaccia", "sourdough", "gouda")), 
                class = "data.frame", row.names = c(NA, -7L))

#split into a list per topic
topics <- split(df, df$topic)

#collapse the keyword column
topics <- lapply(topics, function(t) {
   paste(t$keyword, collapse =  "|")
})

#rename
names(topics)<- paste0(names(topics), "_regex")

topics

$bread_regex
[1] "focaccia|sourdough"

$cheese_regex
[1] "cheddar|swiss|gouda"

$meat_regex
[1] "beef|chicken"

CodePudding user response：

We could do something like this:

after grouping we could use summarise together with paste and collapse to get our regex s
Then, when the regex is needed we could refer to it by indexing like the example below:

library(dplyr)
library(stringr) #str_detect
my_regex <- df %>% 
  group_by(topic) %>% 
  summarise(regex = paste(keyword, collapse = "|"))

df %>% 
  mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))

 A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken       
> df %>% 
    mutate(new_col = ifelse(str_detect(keyword, my_regex$regex[1]), "it is bread", "it is not bread"))
   topic   keyword         new_col
1 cheese   cheddar it is not bread
2   meat      beef it is not bread
3   meat   chicken it is not bread
4 cheese     swiss it is not bread
5  bread  focaccia     it is bread
6  bread sourdough     it is bread
7 cheese     gouda it is not bread

CodePudding user response：

You can use dplyr's group_by() and summarise()

df %>%
  group_by(topic) %>%
  summarise(regex = paste(keyword, collapse = "|"))

# A tibble: 3 × 2
  topic  regex              
  <chr>  <chr>              
1 bread  focaccia|sourdough 
2 cheese cheddar|swiss|gouda
3 meat   beef|chicken

Or you can apply your function to every unique value in df$topic:

map_chr(unique(df$topic) %>% setNames(paste0(., "_regex")),
        fn_get_topic_regex, df = df)

         cheese_regex            meat_regex           bread_regex 
"cheddar|swiss|gouda"        "beef|chicken"  "focaccia|sourdough"

Just remember to add return(regex) to the end of your function, or not to assign the last line to a variable at all. I would even put everything in a single pipe chain:

fn_get_topic_regex <- function(targettopic,df)
{
  df |>
    filter(topic == targettopic) |>
    pull(keyword) |>
    paste(collapse =  "|")
}