Home > Back-end >  Finding which larger string contains a list of smaller strings
Finding which larger string contains a list of smaller strings

Time:01-12

Apologies if this is a duplicate, I could not find anything on this and I everything I start doing ends up excessively complex.

I would like to find which strings if any are NOT in the list of smaller strings.

channels <- tolower(c("Digital","Social","TV","YouTube","Radio","OOH","Facebook","Reddit","Podcast"
,"Instagram","LinkedIn","Twitter","TikTok","Print","Cinema","VOD","OTS","xmedia"))

larger_strings <- c("weights_digital","weights_tv","weights_social","weights_ooh","weights_print",
"weights_appletv")

Desired Output: This would be the selected output as 'appletv' is not part of the channels vector.

not_included <- "weights_appletv"

CodePudding user response:

One base R option will be to split the larger_strings at _, loop over the list output from strsplit, check if all the values are not %in% channels to subset the larger_strings

larger_strings[sapply(strsplit(larger_strings, "_"), \(x) all(!x %in% channels))]
[1] "weights_appletv"

Or read as a two column data.frame, then filter with if_any on the columns and unite the rest of the rows filtered

library(dplyr)
library(tidyr)
read.table(text = larger_strings, header = FALSE, sep = "_") %>% 
  filter(!if_any(everything(), ~ .x %in% channels)) %>% 
  unite(V1, V1, V2) %>%
  pull(V1)
[1] "weights_appletv"
  • Related