Home > Mobile >  how to negate any alphanumeric character with alnum in r (str_)
how to negate any alphanumeric character with alnum in r (str_)

Time:11-29

I would like to rewrite the following regular expression in r by using [:alnum:], that in my understanding should do the same thing:

starwars %>% mutate(name = str_replace_all(name, "[^a-zA-Z\\d\\s:\u00C0-\u00FF]", ""))

But the behaviour I get is not at all what I expected:

starwars %>% mutate(name = str_replace_all(name, "[^:alnum:]", ""))

By the way, I need to remove the underscores _ and the all the spaces.

CodePudding user response:

You can use

library(stringr)
str_replace_all(name, "[^[:alnum:]] ", "")
## or
str_replace_all(name, "[:^alnum:] ", "")

The [^[:alnum:]] pattern is a negated bracket expression ([^...]) that matches any chars other than letters and digits ([:alnum:], a POSIX character class).

The [:^alnum:] pattern is an extension of the POSIX character class with an inverse meaning.

The is a quantifier, it matches one or more occurrences of the pattern it quantifies.

Also, in stringr, the shorthand character classes are Unicode aware, so you may also use

str_replace_all(name, "[\\W_] ", "")

where \W matches any char other than Unicode letters, digits or underscores, and _ matches underscores.

  • Related