I would like to remove the accent "é" in a large dataset, but only for the strings in the list.
Here below a small replicable example:
library(tidyverse)
library(stringr)
library(dplyr)
library(tidyr)
library(stringi)
data <- data.frame (territory = c("Abbécourt", "Achéres", "Beaumé", "Belvezé",
"Marré"))
# I create a list of string for which I want to remove the accent
strings<-c("Abbécourt","Achéres","Belvezé")
strings <- paste(paste0("^", strings[order(-nchar(strings))], "$"), collapse = "|")
What I do is:
data <- data %>% dplyr::mutate(territory = gsub("é", "e", territory))
but of course the command removes all the "é" in the dataset.
I can't find a way to have the following output:
territory
1 Abbecourt
2 Acheres
3 Beaumé
4 Belveze
5 Marré
Thank you very much for your help, Best Regards,
CodePudding user response:
Create a condition with case_when/ifelse
to check for the presence of the elements %in%
the strings and then only modify those elements with gsub/str_replace_all
library(stringr)
library(dplyr)
data %>%
mutate(territory = case_when(territory %in% strings ~
str_replace_all(territory, "é", "e"), TRUE ~ territory))
-output
territory
1 Abbecourt
2 Acheres
3 Beaumé
4 Belveze
5 Marré