Home > Blockchain >  Can't remove � symbols in R with regex
Can't remove � symbols in R with regex

Time:06-10

I am trying to clean text data before doing operations with it and the � symbol keeps not getting removed even when I try to target it specifically with gsub and then throwing an error when I try to lower cases.

normalize_name <- function(name){
  
  normalized_name <- gsub("[^[0-9A-Za-z][:blank:]]", "", name) #Removes special characters and spaces
  normalized_name <- gsub("�", "", normalized_name)
  normalized_name <- tolower(normalized_name)
  return(normalized_name)
}

CodePudding user response:

Try this

normalize_name <- function(name){
  
  normalized_name <- gsub("[^[0-9A-Za-z][:blank:]]", "", name) #Removes special characters and spaces
  normalized_name <- gsub("\UFFFD", "", normalized_name)
  normalized_name <- tolower(normalized_name)
  return(normalized_name)
}
  • Related