Home > Back-end >  Easiest way to label and count string occurrences in freeform text in R?
Easiest way to label and count string occurrences in freeform text in R?

Time:08-09

Given these two R dataframes of a set of freeform texts and a set of arbitrary keywords:

df <- as.data.frame(c(
"I have a social media account on Twitter",
"I love cheese recipes on Facebook",
"I love cheese recipes on Pinterest",
"I am a social media marketer on Instagram who loves social media",
"I love posting cheese recipes on social media",
"Conspiracy theories are logical fallacies"
)) |>
rename(phrase = 1)

keyword_df <- as.data.frame(c(
"social media",
"cheese recipe",
"tinfoil hat"
))

What's the easiest tidyverse way to create this outcome?

phrase social_media cheese_recipe tinfoil_hat
I have a social media account on Twitter 1 0 0
I love cheese recipes on Facebook 0 1 0
I love cheese recipes on Pinterest 0 1 0
I am a social media marketer on Instagram who loves social media 2 0 0
I love posting cheese recipes on social media 1 1 0
Conspiracy theories are logical fallacies 0 0 0

CodePudding user response:

df %>%
  mutate(as.data.frame(lapply(
    setNames(nm = keyword_df[[1]]),
    function(z) lengths(stringr::str_extract_all(phrase, z))
  )))
#                                                             phrase social.media cheese.recipe tinfoil.hat
# 1                         I have a social media account on Twitter            1             0           0
# 2                                I love cheese recipes on Facebook            0             1           0
# 3                               I love cheese recipes on Pinterest            0             1           0
# 4 I am a social media marketer on Instagram who loves social media            2             0           0
# 5                    I love posting cheese recipes on social media            1             1           0
# 6                        Conspiracy theories are logical fallacies            0             0           0
  • Related