I'm looking to wrap parts of a string in R, following certain rules, in a vectorised way.
Put simply, if I had a vector:
c("x^2", "x^2:z", "z", "x:z", "z:x:b", "z:x^2:b")
the function would sweep through each element and wrap I()
around those parts where there is an exponent, resulting in the following output:
c("I(x^2)", "I(x^2):z", "z", "x:z", "z:x:b", "z:I(x^2):b")
I've tried various approaches where I first split by : and then gsub, but this isn't particularly scalable.
CodePudding user response:
Something like below?
> gsub("(x(\\^\\d ))", "I(\\1)", c("x^2", "x^2:z", "z", "x:z", "z:x:b", "z:x^2:b"))
[1] "I(x^2)" "I(x^2):z" "z" "x:z" "z:x:b"
[6] "z:I(x^2):b"
CodePudding user response:
These seem reasonably general. They don't assume that the variable in the complex fields must be named x
but handle any names made of word characters and also don't assume that the arithmetic expression must be an exponential but handle any arithmetic expression that includes non-word characters. for example, they would surround y pi
with I(...).
1) This one liner captures each field and processes it using the indicated function, expressed in formula notation. It surrounds each field that contains a non-word character with I(...) . It works with any variables whose names are made from word character.
library(gsubfn)
gsubfn("[^:] ", ~ if (grepl("\\W", x)) sprintf("I(%s)", x) else x, s)
## [1] "I(x^2)" "I(x^2):z" "z" "x:z" "z:x:b"
## [6] "z:I(x^2):b"
2) This surrounds any field containing a character that is not :, letter or number with I(...)
gsub("([^:]*[^:[:alnum:]][^:]*)", "I(\\1)", s)
## [1] "I(x^2)" "I(x^2):z" "z" "x:z" "z:x:b"
## [6] "z:I(x^2):b"
3) In this alternative we split the strings at colon, then surround fields containing a non-word character with I(...) and paste them back together.
surround <- function(x) ifelse(grepl("\\W", x), sprintf("I(%s)", x), x)
s |>
strsplit(":") |>
sapply(function(x) paste(surround(x), collapse = ":"))
## [1] "I(x^2)" "I(x^2):z" "z" "x:z" "z:x:b"
## [6] "z:I(x^2):b"
Note
The input used is the following:
s <- c("x^2", "x^2:z", "z", "x:z", "z:x:b", "z:x^2:b")