I have strings such as the following:
2 - 5-< 2
6 - 10-< 2
6 - 10-2 - 5
> 15-2 - 5
I want to split those string just in the point where the - is neither preceded nor followed by blank space. Therefore, the strings above would get split as follows:
"2 - 5" "< 2"
"6 - 10" "< 2"
"6 - 10" "2 - 5"
"> 15" "2 - 5"
In R Studio I have tried using sub() and strsplit() but I have found hard to set the right regex expression. Does anyone has a clue?
CodePudding user response:
Use perl=TRUE
with lookaround:
vec <- c("2 - 5-< 2", "6 - 10-< 2", "6 - 10-2 - 5", "> 15-2 - 5")
strsplit(vec, "(?<! )-(?!= )", perl=TRUE)
# [[1]]
# [1] "2 - 5" "< 2"
# [[2]]
# [1] "6 - 10" "< 2"
# [[3]]
# [1] "6 - 10" "2 - 5"
# [[4]]
# [1] "> 15" "2 - 5"
CodePudding user response:
I guess this is an easier-to-understand solution:
library(stringr)
str_split(vec, "(?<=\\d)-(?=\\d)")
[[1]]
[1] "2 - 5" "< 2"
[[2]]
[1] "6 - 10" "< 2"
[[3]]
[1] "6 - 10" "2 - 5"
[[4]]
[1] "> 15" "2 - 5"
First off, no perl = TRUE
needed (well, but a new package, stringr
).
But then, (?<=\\d)
and (?=\\d)
are positive lookarounds, which are inherently easier to process. The first means: if you see a digit on the left ...; the second says, if you see a digit on the right ... And str_split
(with the underscore) says, if these two conditions are met, then split on the dash -
.