Home > Mobile >  Check if a value in a R dataframe adheres to a column values/any combination of the values in a list
Check if a value in a R dataframe adheres to a column values/any combination of the values in a list

Time:05-05

I have a really large dataframe and I need to check that the values for a certain column adhere to an item in a list. This can be any item from the list or a combination of items with a comma separating them.

In the example below, I only want the last value (‘no colour’) to throw a fail as it doesn’t appear in the list called Type.

Type <- list(c('blue','green','black','red'))

Data <-data.frame(colour=c("blue","blue,green", 'blue,black,red', 'black,red', 'no colour'))

Thanks

CodePudding user response:

We may paste the elements in the list and filter

library(stringr)
library(dplyr)
Data %>% 
  filter(str_detect(colour, str_c(Type[[1]], collapse = "|")))

-output

           colour
1           blue
2     blue,green
3 blue,black,red
4      black,red

CodePudding user response:

Using strsplit.

sapply(strsplit(Data$colour, ','), \(x) all(x %in% Type[[1]]))
# [1]  TRUE  TRUE  TRUE  TRUE FALSE

CodePudding user response:

Here's another possible option, where we can remove any colours from the list, then clean up the remaining characters (remove white space and remove commas). Then, I use nzchar to detect if there are any remaining words, if so, then remove that row.

Data[!nzchar(trimws(gsub(
  "[[:punct:]]", "", gsub(paste0(Type[[1]], collapse = "|"), "", Data$colour)
))), ]

Output

          colour
1           blue
2     blue,green
3 blue,black,red
4      black,red
  •  Tags:  
  • r
  • Related