Home > database >  R - If column contains a string from vector, append flag into another column
R - If column contains a string from vector, append flag into another column

Time:12-17

My Data

I have a vector of words, like the below. This is an oversimplification, my real vector is over 600 words:

myvec <- c("cat", "dog, "bird")

I have a dataframe with the below structure:

structure(list(id = c(1, 2, 3), onetext= c("cat furry pink british", 
"dog cat fight", "bird cat issues"), cop= c("Little Grey Cat is the nickname given to a kitten of the British Shorthair breed that rose to viral fame on Tumblr through a variety of musical tributes and photoshopped parodies in late September 2014", 
"Dogs have soft fur and tails so do cats Do cats like to chase their tails", 
"A cat and bird can coexist in a home but you will have to take certain measures to ensure that a cat cannot physically get to the bird at any point"
), text3 = c("On October 4th the first single topic blog devoted to the little grey cat was launched On October 20th Tumblr blogger Torridgristle shared a cutout exploitable image of the cat, which accumulated over 21000 notes in just over three months.", 
"there are many fights going on and this is just an example text", 
"Some cats will not care about a pet bird at all while others will make it its life mission to get at a bird You will need to assess the personalities of your pets and always remain on guard if you allow your bird and cat to interact"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

It looks like the below picture

sample dataset

My issue

For each keyword on my vector myvec, I need to go around the dataset and check the columns onetext, cop, text3, and if I find the keyword on either of those 3 columns, then I need to append the keyword into a new column. The result would be as the image as follows:

expected result

My original dataset is quite large (the last column is the longest), so doing multiple nested loops (which is what I tried) is not ideal.

EDIT: Note that as long as the word appears once in that row, that's enough and should be listed. All keywords should be listed.

How could I do this? I'm using tidyverse, so my dataset is actually a tibble.

Similar Posts (but not quite)

The following posts are somewhat similar, but not quite:

CodePudding user response:

Update: If a list is preferred: Using str_extract_all:

df %>%  
  transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) 

gives:

  new_colonetext new_colcop new_coltext3
  <list>         <list>     <list>      
1 <chr [1]>      <NULL>     <chr [2]>   
2 <chr [2]>      <chr [2]>  <NULL>      
3 <chr [2]>      <chr [4]>  <chr [5]>  

Here is how you could achieve the result:

  1. create a pattern of the vector
  2. use mutate across to check the needed columns
  3. if the desired string is detected then extract to a new column !
myvec <- c("cat", "dog", "bird")

pattern <- paste(myvec, collapse="|")

library(dplyr)
library(tidyr)
df %>% 
  mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>% 
  unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
    id onetext                cop                                                                        text3                                                                              topic                                     
  <dbl> <chr>                  <chr>                                                                      <chr>                                                                              <chr>                                     
1     1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"            
2     2 dog cat fight          Dogs have soft fur and tails so do cats Do cats like to chase their tails  there are many fights going on and this is just an example text                    "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3     3 bird cat issues        A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~                                                                                    
  • Related