there's a lot of people asking how to remove accents from data, but I'm looking for how to remove the entire character. They're retained using [[:alnum:]], and [[A-Za-z]]. What would I have to do to get rid of them?
Thanks
CodePudding user response:
You can do in base
R:
gsub("[^A-Za-z ]", "", "à la volée d'où est-il")
[1] " la vole do estil"
Here exclude everything that is not letters and spaces. Have a look if you want to keep punctuation with [:punct:]
CodePudding user response:
Without using regex, you could define a set of letters you would accept, which is available in base R as the letters
and LETTERS
vectors.
characters_to_keep <- c(letters, " ")
accentstring <- "éqodio diq ozàoih"
result <- unlist(strsplit(accentstring,""))
result <- result[result %in% characters_to_keep]
result <- paste0(result, collapse="")
> result
[1] "qodio diq ozoih"