Home > Back-end >  Remove accent for specific strings R
Remove accent for specific strings R

Time:11-05

I would like to remove the accent "é" in a large dataset, but only for the strings in the list.

Here below a small replicable example:

library(tidyverse)
library(stringr)
library(dplyr)
library(tidyr)
library(stringi)

data <- data.frame (territory  = c("Abbécourt", "Achéres", "Beaumé", "Belvezé", 
"Marré"))

# I create a list of string for which I want to remove the accent
strings<-c("Abbécourt","Achéres","Belvezé")
strings <- paste(paste0("^", strings[order(-nchar(strings))], "$"), collapse = "|")

What I do is:

data <- data %>% dplyr::mutate(territory = gsub("é", "e", territory))

but of course the command removes all the "é" in the dataset.

I can't find a way to have the following output:

territory
1 Abbecourt
2   Acheres
3    Beaumé
4   Belveze
5     Marré

Thank you very much for your help, Best Regards,

CodePudding user response:

Create a condition with case_when/ifelse to check for the presence of the elements %in% the strings and then only modify those elements with gsub/str_replace_all

library(stringr)
library(dplyr)
data %>%
   mutate(territory = case_when(territory %in% strings ~ 
    str_replace_all(territory, "é", "e"), TRUE ~ territory))

-output

  territory
1 Abbecourt
2   Acheres
3    Beaumé
4   Belveze
5     Marré
  •  Tags:  
  • r
  • Related