Home > OS >  Remove accented characters, don't replace them
Remove accented characters, don't replace them

Time:07-13

there's a lot of people asking how to remove accents from data, but I'm looking for how to remove the entire character. They're retained using [[:alnum:]], and [[A-Za-z]]. What would I have to do to get rid of them?

Thanks

CodePudding user response:

You can do in base R:

gsub("[^A-Za-z ]", "", "à la volée d'où est-il")
[1] " la vole do estil"

Here exclude everything that is not letters and spaces. Have a look if you want to keep punctuation with [:punct:]

CodePudding user response:

Without using regex, you could define a set of letters you would accept, which is available in base R as the letters and LETTERS vectors.

characters_to_keep <- c(letters, " ")
accentstring <- "éqodio diq ozàoih"
result <- unlist(strsplit(accentstring,""))
result <- result[result %in% characters_to_keep]
result <- paste0(result, collapse="")
> result
[1] "qodio diq ozoih"
  • Related