I want to use strsplit
at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit
? Why is that so and what is to change?
strsplit('AaaBbbCcc', '(?=\\p{Lu})', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[A-Z])', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[ABC])', perl=TRUE)[[1]]
# [1] "A" "aa" "B" "bb" "C" "cc"
Expected result:
# [1] "Aaa" "Bbb" "Ccc"
In the Demo it actually looks fine.
Ideally it should split before every camel case, e.g. Aa
and not AA
; there's \\p{Lt}
but this doesn't seem to work at all.
strsplit('AaaABbbBCcc', '(?=\\p{Lt})', perl=TRUE)[[1]]
# [1] "AaaABbbBCcc"
Expected result:
# [1] "AaaA" "BbbB" "Ccc"
CodePudding user response:
It seems that by adding (?!^)
you can obtained the desired result.
strsplit('AaaBbbCcc', "(?!^)(?=[A-Z])", perl=TRUE)
For the camel case we may do
strsplit('AaaABbbBCcc', '(?!^)(?=\\p{Lu}\\p{Ll})', perl=TRUE)[[1]]
strsplit('AaaABbbBCcc', '(?!^)(?=[A-Z][a-z])', perl=TRUE)[[1]] ## or
# [1] "AaaA" "BbbB" "Ccc"
CodePudding user response:
This all sounds like a problem of parsing a CamelCase name, which appears to be already solved.
> gsub("([a-z])([A-Z])", replacement = "\\1 \\2", x = 'AaaBbbCcc')
[1] "Aaa Bbb Ccc"