I have a really large dataframe and I need to check that the values for a certain column adhere to an item in a list. This can be any item from the list or a combination of items with a comma separating them.
In the example below, I only want the last value (‘no colour’) to throw a fail as it doesn’t appear in the list called Type.
Type <- list(c('blue','green','black','red'))
Data <-data.frame(colour=c("blue","blue,green", 'blue,black,red', 'black,red', 'no colour'))
Thanks
CodePudding user response:
We may paste
the elements in the list
and filter
library(stringr)
library(dplyr)
Data %>%
filter(str_detect(colour, str_c(Type[[1]], collapse = "|")))
-output
colour
1 blue
2 blue,green
3 blue,black,red
4 black,red
CodePudding user response:
Using strsplit
.
sapply(strsplit(Data$colour, ','), \(x) all(x %in% Type[[1]]))
# [1] TRUE TRUE TRUE TRUE FALSE
CodePudding user response:
Here's another possible option, where we can remove any colours from the list, then clean up the remaining characters (remove white space and remove commas). Then, I use nzchar
to detect if there are any remaining words, if so, then remove that row.
Data[!nzchar(trimws(gsub(
"[[:punct:]]", "", gsub(paste0(Type[[1]], collapse = "|"), "", Data$colour)
))), ]
Output
colour
1 blue
2 blue,green
3 blue,black,red
4 black,red