I'm trying to use filter(grepl())
to match some words in my column. Let's suppose I want to extract the word "Guartelá". In my column, i have variations such as "guartela" "guartelá" and "Guartela". To match upper/lowercase words I'm using (?i)
. However, I haven't found a good way to match accent/no-accent (i.e., "guartelá" and "guartela").
I know that I can simply substitute á
by a
, but is there a way to assign the accent-insensitive in the code? It can be base R/tidyverse/any, I don't mind.
Here's how my curent code line is:
cobras <- final %>% filter(grepl("(?i)guartelá", NAME)
| grepl("(?i)guartelá", locality))
Cheers
CodePudding user response:
you can use stri_trans_general fron stringi to remove all accents:
unaccent_chars= stringi::stri_trans_general(c("guartelá","with_é","with_â","with_ô") ,"Latin-ASCII")
unaccent_chars
# [1] "guartela" "with_e" "with_a" "with_o"
# grepl(paste(unaccent_chars,collapse = "|"), string)
CodePudding user response:
You can pass options in OR
statements using [
to capture different combinations
> string <- c("Guartelá", "Guartela", "guartela", "guartelá", "any")
> grepl("[Gg]uartel[aá]", string)
[1] TRUE TRUE TRUE TRUE FALSE
CodePudding user response:
Another option using str_detect()
:
library(tidyverse)
tibble(name = c("guartela","guartelá", "Guartela", "Other")) |>
filter(str_detect(name, "guartela|guartelá|Guartela"))