I am trying to clean text data before doing operations with it and the � symbol keeps not getting removed even when I try to target it specifically with gsub and then throwing an error when I try to lower cases.
normalize_name <- function(name){
normalized_name <- gsub("[^[0-9A-Za-z][:blank:]]", "", name) #Removes special characters and spaces
normalized_name <- gsub("�", "", normalized_name)
normalized_name <- tolower(normalized_name)
return(normalized_name)
}
CodePudding user response:
Try this
normalize_name <- function(name){
normalized_name <- gsub("[^[0-9A-Za-z][:blank:]]", "", name) #Removes special characters and spaces
normalized_name <- gsub("\UFFFD", "", normalized_name)
normalized_name <- tolower(normalized_name)
return(normalized_name)
}