Home > Net >  R - Testing which rows of a multi-column dataframe contain keyword
R - Testing which rows of a multi-column dataframe contain keyword

Time:11-28

Assume a dataframe dat with p-values.

dat <- data.frame(var1 = c("0.12", "0.12", "0.12*"), 
                  var2 = c("0.12", "0.12", "0.12"), 
                  var3 = c("0.12", "0.12", "0.12"))

How do I test which rows contain an asterisk?

Attempt 1:

dat %>%
  mutate(anyTRUE = if_any(.rows = contains('\\*'), isTRUE))
   var1 var2 var3 anyTRUE
1  0.12 0.12 0.12    TRUE
2  0.12 0.12 0.12    TRUE
3 0.12* 0.12 0.12    TRUE

CodePudding user response:

Use str_detect/grepl - contains/matches/starts_with/ends_with are all select-helpers used to match and select column names based on a pattern. Here, we want to detect rows having a pattern.

library(stringr)
library(dplyr)
dat <- dat %>%
    mutate(anyTRUE = if_any(everything(), ~ str_detect(.x, fixed("*"))))

-output

dat
   var1 var2 var3 anyTRUE
1  0.12 0.12 0.12   FALSE
2  0.12 0.12 0.12   FALSE
3 0.12* 0.12 0.12    TRUE

NOTE: fixed is used as the pattern by default uses regex mode and * is a metacharacter to specify zero or more of the character preceding it. Either escape (\\) or use fixed (which would be faster)


Or using base R

dat$anyTRUE <-  Reduce(`|`, lapply(dat, grepl, pattern = "*", fixed = TRUE))

CodePudding user response:

Here is an alternative approach with unite

library(dplyr)
library(tidyr)
library(stringr)

dat %>% 
  unite(check, remove = FALSE) %>% 
  mutate(check = str_detect(check, '\\*')) 
  check  var1 var2 var3
1 FALSE  0.12 0.12 0.12
2 FALSE  0.12 0.12 0.12
3  TRUE 0.12* 0.12 0.12
  • Related