Home > Software engineering >  R raw strings with interpolation
R raw strings with interpolation

Time:10-25

As of version 4.0 R supports a special syntax for raw strings, but, how can it be used in tandem with string interpolation? That could be very useful for passing raw regular expressions. E.g., 123\b instead of 123\\b. I've tried using glue:

> tmp = "123\b"
> str_detect("123 4", glue(r"[{tmp}]"))
[1] FALSE

Using a raw string directly does work:

> str_detect("123 4", r"[123\b]")
[1] TRUE

CodePudding user response:

The problem here is that after tmp is defined, it is too late to have the \b interpreted as a literal sequence of characters. The character string is stored internally as the byte sequence 31 32 33 08, not the byte sequence 31 32 33 5c 62, which is what you would need for your example to work.

If you have existing character strings you wish to use in this way, you need to convert the escape sequences back into literal backslash-character pairs before you use them. One fairly hacky way to do this is to use the console's printing method itself.

As you showed yourself, this doesn't work:

tmp  <- "123\b"

charToRaw(tmp)
#> [1] 31 32 33 08

stringr::str_detect("123 4", tmp)
#> [1] FALSE

But if we write a little wrapper around capture.output, we can get the characters that R needs to replicate the original intended string:

f <- function(x) substr(capture.output(noquote(x)), 5, 1e4)

charToRaw(f(tmp))
#> [1] 31 32 33 5c 62

stringr::str_detect("123 4", f(tmp))
#> [1] TRUE

So the function f can be thought of as a way of properly catching the string literals. The new raw string input method can't really help here.

Created on 2021-10-24 by the reprex package (v2.0.0)

  •  Tags:  
  • r
  • Related