Home > Enterprise >  Ignore case with multiple strings using str_detect in R
Ignore case with multiple strings using str_detect in R

Time:07-13

I'm trying to use an "or" statement while ignoring case with str_detect. I want to convert everything that contains "ag" to "Agricultural" and everything that contains "field" to "Agricultural"

Here's an example:

(dat <- 
  data.frame(Class = c("ag", "Agricultural--misc", "old field")))

This works:

(dat %>% 
  mutate(Class_2 = case_when(
    # Aggregate all Agricultural classes
    str_detect(Class, fixed("Ag", ignore_case=TRUE)) ~ "Agricultural",
    # Convert 'field' to Agricultural
    str_detect(Class, fixed("field", ignore_case=TRUE)) ~ "Agricultural",
    TRUE ~ Class)))

But I want to condense the two lines to just one line, as below:

(dat %>% 
    mutate(Class_2 = case_when(
      # Aggregate all Agricultural and field classes to Agricultural
      str_detect(Class, fixed("Ag|field", ignore_case=TRUE)) ~ "Agricultural",
      TRUE ~ Class)))

CodePudding user response:

A possible solution would be to bring everything to lower case and match that with ag|field.

dat %>%
  mutate(Class_2 = case_when(
    str_detect(string = str_to_lower(Class),
               pattern = "ag|field") ~ "Agricultural",
    TRUE ~ Class
  ))

# A tibble: 3 × 2
  Class              Class_2     
  <chr>              <chr>       
1 ag                 Agricultural
2 Agricultural--misc Agricultural
3 old field          Agricultural

CodePudding user response:

I just came across the (?i) argument. So another solution would be to remove the fixed() argument, and add (?i) to the string like this:

(dat %>%
    mutate(Class_2 = case_when(
      # Aggregate all Agricultural and field classes to Agricultural
      str_detect(Class, "(?i)Ag|field") ~ "Agricultural",
      TRUE ~ Class)))

But I think I like the str_to_lower option, as the code is more readable.

CodePudding user response:

You could also use regex like this:

dat <- data.frame(Class = c("ag", "Agricultural--misc", "old field"))

library(dplyr)
library(stringr)
dat %>% 
  mutate(Class_2 = case_when(str_detect(Class, regex('Ag|field', ignore_case = T))~'Agricultural',
                             TRUE ~ Class))
#>                Class      Class_2
#> 1                 ag Agricultural
#> 2 Agricultural--misc Agricultural
#> 3          old field Agricultural

Created on 2022-07-12 by the reprex package (v2.0.1)

  • Related