Home > Net >  How can I replace '\U' using regular expressions?
How can I replace '\U' using regular expressions?

Time:09-28

The question is pretty simple. I'm trying to replace "\U" throughout a vector of strings, and for this I'm using the package {stringr}, but I'm having issues matching the pattern.

text <- "\U0001f517"

stringr::str_detect(text, "\U")
#> Error: '\U' used without hex digits in character string starting ""\U"

stringr::str_detect(text, "\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
#>   Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\U`)

stringr::str_detect(text, "\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\U"

stringr::str_detect(text, "\\\\U")
#> FALSE

stringr::str_detect(text, "\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\U"

stringr::str_detect(text, "\\\\\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : 
#>   Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\\\U`)

stringr::str_detect(text, "\\\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\\\U"

# ... you get the idea

As far as I can tell, this issue is because the regex engine sees "\U" as indicating the beginning of a new hex code, as indicated by the first error. Other characters work fine:

text <- "\a0001f517"

stringr::str_detect(text, "\a")
#> TRUE

I've seen other questions around this issue, e.g. here, but still can't get this to work. Can anyone give me a working regex for this?

CodePudding user response:

\U in your text <- "\U0001f517" is not a separate char sequence, it is part of the Unicode character code point notation. The literal text in the text variable is in fact

  • Related