Home > Mobile >  Create a Dynamic Number of New Columns in a Data Frame Based on the Levels in a Vector
Create a Dynamic Number of New Columns in a Data Frame Based on the Levels in a Vector

Time:12-09

I'm looking to add an unknown number of new columns to a data frame based on the number of unique values found in a column of another data frame. I have the following two data frames:

user text.string
Bob I like yellow submarines
Jane I like red cars
my.base.df <- data.frame(
  "user" = c("Bob", "Jane")
  , "text.string" = c("I like yellow submarines", "I like red cars")
)
theme term
colours yellow
colours red
colours blue
cars ford
cars toyota
cars fiat
my.theme.df <- data.frame(
  "theme" = c(rep("cars", 3), rep("colours", 3))
  , "term" = c("ford", "toyota", "fiat", "red", "yellow", "blue")
)

And I want to flag the themes found in each text.string, to end up with something like this:

user text.string cars colours
Bob I like yellow submarines 0 1
Jane I like red cars 1 1

I think I can match the terms to the text.string with a for loop, but I'm worried it's not scalable outside of this toy example. But the bit I'm really stuck on is that I can't create the "cars" or "colours" columns in my.base.df dynamically from the result of levels(my.theme$themes)

In the real world the number of levels in my.theme.df$theme could be up to twenty, with over one hundred my.theme.df$term matching a my.theme.df$theme. Similarly, my.base.df could contain upto one thousand observations, so I'm worried about efficiency too.

Any help or pointers would be great?

Thank you,

Jamie

CodePudding user response:

Split your data frame by theme, and then use the theme all related terms on the relevant string.

my.base.df <- data.frame(
  "user" = c("Bob", "Jane"),
  "text.string" = c("I like yellow submarines", "I like red cars")
)

my.theme.df <- data.frame(
  "theme" = c(rep("cars", 3), rep("colours", 3)),
  "term" = c("ford", "toyota", "fiat", "red", "yellow", "blue")
)

theme_split <- split(my.theme.df, my.theme.df$theme)

for (x in names(theme_split)) {
  # include theme itself in search
  terms <- paste(c(theme_split[[x]][["term"]], x), collapse = "|")
  my.base.df[[x]] <- grepl(terms, my.base.df$text.string)
}

my.base.df
#>   user              text.string  cars colours
#> 1  Bob I like yellow submarines FALSE    TRUE
#> 2 Jane          I like red cars  TRUE    TRUE
  •  Tags:  
  • r
  • Related