Home > database >  Remove all non-alphanumerical values from dataframe in R
Remove all non-alphanumerical values from dataframe in R

Time:07-12

I'm trying to remove special characters from the rows in my data frame. so keep numbers and alphabets. I have tried this code but it also takes out the alphabet rows. Ultimately trying to remove the special characters

df[] <- lapply(df, function(x) gsub("[^-0-9/.] ", "", x))

CodePudding user response:

What exactly do you mean by "special characters"? Do you want to keep accented characters, or remove them? E.g.

df <- data.frame(test1 = c("1", "a", "!"),
                 test2 = c("á", "22.", "()*^$"))
df
#>   test1 test2
#> 1     1     á
#> 2     a   22.
#> 3     ! ()*^$
df[] <- lapply(df, function(x) gsub("[^[[:alnum:]-] ", "", x))
df
#>   test1 test2
#> 1     1     á
#> 2     a    22
#> 3

df[] <- lapply(df, function(x) gsub("[^[a-zA-Z0-9] ", "", x))
df
#>   test1 test2
#> 1     1      
#> 2     a    22
#> 3

Created on 2022-07-12 by the reprex package (v2.0.1)

CodePudding user response:

Try this

x <- "I'will remove all 999 - @ ?? -9/."

gsub("\\W", " ", x)

#> "I will remove all 999         9  "

if you want to remove the long spaces , use

gsub("\\W", " ", x) |> gsub("\\s{2,}" , " " , x = _)

#> "I will remove all 999 9 "
  • Related