Home > front end >  Filter rows with colon-separated text in a column based on a list
Filter rows with colon-separated text in a column based on a list

Time:03-25

Apologies if this has been answered before. I struggled to find the answer that helps me.

Let's say I have a data frame: -

Name <- c('P1','P2;P3','P4','P5','P6;P7', "P8", "P9")
Count <- c(15,3,10,4,3,11,9)
df <- data.frame(Name, Count)

I want to filter the rows where text in column Name match with the list below: -

list <- c("P1", "P2", "P6", "P9")

Note, list has fewer values than the number of rows in the df. The resulting data frame should be:-

Name Count
P1 15
P2;P3 3
P6;P7 3
P9 9

Every way I try, R doesn't recognize the values separated by semi-colons and leaves them out of the filtering process. I prefer using Tidyverse-based functions but any help would be greatly received.

Many thanks in advance, Andy

CodePudding user response:

You could do:

df %>% filter(sapply(Name, function(x) any(stringr::str_detect(x, list))))
#>    Name Count
#> 1    P1    15
#> 2 P2;P3     3
#> 3 P6;P7     3
#> 4    P9     9

Or in full tidyverse idiom:

library(tidyverse)

df %>% filter(map_lgl(Name, ~any(str_detect(.x, list))))
#>    Name Count
#> 1    P1    15
#> 2 P2;P3     3
#> 3 P6;P7     3
#> 4    P9     9

As an obligatory side note, it is bad practice to call a variable list, since this clashes with the name of the function list

CodePudding user response:

Split names and compare with the list:

df[ sapply(strsplit(df$Name, ";"), function(i) any(i %in% list)), ]

Or grepl with OR - "|":

df[ grepl(paste(list, collapse = "|"), df$Name), ]
#    Name Count
# 1    P1    15
# 2 P2;P3     3
# 5 P6;P7     3
# 7    P9     9                    
  • Related