Create a Dynamic Number of New Columns in a Data Frame Based on the Levels in a Vector-CodePudding

I'm looking to add an unknown number of new columns to a data frame based on the number of unique values found in a column of another data frame. I have the following two data frames:

user	text.string
Bob	I like yellow submarines
Jane	I like red cars

my.base.df <- data.frame(
  "user" = c("Bob", "Jane")
  , "text.string" = c("I like yellow submarines", "I like red cars")
)

theme	term
colours	yellow
colours	red
colours	blue
cars	ford
cars	toyota
cars	fiat

my.theme.df <- data.frame(
  "theme" = c(rep("cars", 3), rep("colours", 3))
  , "term" = c("ford", "toyota", "fiat", "red", "yellow", "blue")
)

And I want to flag the themes found in each text.string, to end up with something like this:

user	text.string	cars	colours
Bob	I like yellow submarines	0	1
Jane	I like red cars	1	1

I think I can match the terms to the text.string with a for loop, but I'm worried it's not scalable outside of this toy example. But the bit I'm really stuck on is that I can't create the "cars" or "colours" columns in my.base.df dynamically from the result of levels(my.theme$themes)

In the real world the number of levels in my.theme.df$theme could be up to twenty, with over one hundred my.theme.df$term matching a my.theme.df$theme. Similarly, my.base.df could contain upto one thousand observations, so I'm worried about efficiency too.

Any help or pointers would be great?

Thank you,

Jamie

CodePudding user response：

Split your data frame by theme, and then use the theme all related terms on the relevant string.

my.base.df <- data.frame(
  "user" = c("Bob", "Jane"),
  "text.string" = c("I like yellow submarines", "I like red cars")
)

my.theme.df <- data.frame(
  "theme" = c(rep("cars", 3), rep("colours", 3)),
  "term" = c("ford", "toyota", "fiat", "red", "yellow", "blue")
)

theme_split <- split(my.theme.df, my.theme.df$theme)

for (x in names(theme_split)) {
  # include theme itself in search
  terms <- paste(c(theme_split[[x]][["term"]], x), collapse = "|")
  my.base.df[[x]] <- grepl(terms, my.base.df$text.string)
}

my.base.df
#>   user              text.string  cars colours
#> 1  Bob I like yellow submarines FALSE    TRUE
#> 2 Jane          I like red cars  TRUE    TRUE