Home > Mobile >  Splitting a string that has a not-space hyphen
Splitting a string that has a not-space hyphen

Time:10-27

I have strings such as the following:

2 - 5-< 2
6 - 10-< 2
6 - 10-2 - 5
> 15-2 - 5

I want to split those string just in the point where the - is neither preceded nor followed by blank space. Therefore, the strings above would get split as follows:

"2 - 5" "< 2"
"6 - 10" "< 2"
"6 - 10" "2 - 5"
"> 15" "2 - 5"

In R Studio I have tried using sub() and strsplit() but I have found hard to set the right regex expression. Does anyone has a clue?

CodePudding user response:

Use perl=TRUE with lookaround:

vec <- c("2 - 5-< 2", "6 - 10-< 2", "6 - 10-2 - 5", "> 15-2 - 5")
strsplit(vec, "(?<! )-(?!= )", perl=TRUE)
# [[1]]
# [1] "2 - 5" "< 2"  
# [[2]]
# [1] "6 - 10" "< 2"   
# [[3]]
# [1] "6 - 10" "2 - 5" 
# [[4]]
# [1] "> 15"  "2 - 5"

CodePudding user response:

I guess this is an easier-to-understand solution:

library(stringr)
str_split(vec, "(?<=\\d)-(?=\\d)")
[[1]]
[1] "2 - 5" "< 2"  

[[2]]
[1] "6 - 10" "< 2"   

[[3]]
[1] "6 - 10" "2 - 5" 

[[4]]
[1] "> 15"  "2 - 5"

First off, no perl = TRUE needed (well, but a new package, stringr). But then, (?<=\\d) and (?=\\d) are positive lookarounds, which are inherently easier to process. The first means: if you see a digit on the left ...; the second says, if you see a digit on the right ... And str_split(with the underscore) says, if these two conditions are met, then split on the dash -.

  • Related