The question is pretty simple. I'm trying to replace "\U"
throughout a vector of strings, and for this I'm using the package {stringr}
, but I'm having issues matching the pattern.
text <- "\U0001f517"
stringr::str_detect(text, "\U")
#> Error: '\U' used without hex digits in character string starting ""\U"
stringr::str_detect(text, "\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
#> Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\U`)
stringr::str_detect(text, "\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\U"
stringr::str_detect(text, "\\\\U")
#> FALSE
stringr::str_detect(text, "\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\U"
stringr::str_detect(text, "\\\\\\U")
#> Error in stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) :
#> Unrecognized backslash escape sequence in pattern. (U_REGEX_BAD_ESCAPE_SEQUENCE, context=`\\\U`)
stringr::str_detect(text, "\\\\\\\U")
#> Error: '\U' used without hex digits in character string starting ""\\\\\\\U"
# ... you get the idea
As far as I can tell, this issue is because the regex engine sees "\U"
as indicating the beginning of a new hex code, as indicated by the first error. Other characters work fine:
text <- "\a0001f517"
stringr::str_detect(text, "\a")
#> TRUE
I've seen other questions around this issue, e.g. here, but still can't get this to work. Can anyone give me a working regex for this?
CodePudding user response:
\U
in your text <- "\U0001f517"
is not a separate char sequence, it is part of the Unicode character code point notation. The literal text in the text
variable is in fact