I would like to rewrite the following regular expression in r by using [:alnum:]
, that in my understanding should do the same thing:
starwars %>% mutate(name = str_replace_all(name, "[^a-zA-Z\\d\\s:\u00C0-\u00FF]", ""))
But the behaviour I get is not at all what I expected:
starwars %>% mutate(name = str_replace_all(name, "[^:alnum:]", ""))
By the way, I need to remove the underscores _
and the all the spaces.
CodePudding user response:
You can use
library(stringr)
str_replace_all(name, "[^[:alnum:]] ", "")
## or
str_replace_all(name, "[:^alnum:] ", "")
The [^[:alnum:]]
pattern is a negated bracket expression ([^...]
) that matches any chars other than letters and digits ([:alnum:]
, a POSIX character class).
The [:^alnum:]
pattern is an extension of the POSIX character class with an inverse meaning.
The
is a quantifier, it matches one or more occurrences of the pattern it quantifies.
Also, in stringr
, the shorthand character classes are Unicode aware, so you may also use
str_replace_all(name, "[\\W_] ", "")
where \W
matches any char other than Unicode letters, digits or underscores, and _
matches underscores.