How to use `strsplit` before every capital letter of a camel case?-CodePudding

I want to use strsplit at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit? Why is that so and what is to change?

strsplit('AaaBbbCcc', '(?=\\p{Lu})', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[A-Z])', perl=TRUE)[[1]]
strsplit('AaaBbbCcc', '(?=[ABC])', perl=TRUE)[[1]]
# [1] "A"  "aa" "B"  "bb" "C"  "cc"

Expected result:

# [1] "Aaa" "Bbb" "Ccc"

In the Demo it actually looks fine.

Ideally it should split before every camel case, e.g. Aa and not AA; there's \\p{Lt} but this doesn't seem to work at all.

strsplit('AaaABbbBCcc', '(?=\\p{Lt})', perl=TRUE)[[1]]
# [1] "AaaABbbBCcc"

Expected result:

# [1] "AaaA" "BbbB" "Ccc"

CodePudding user response：

It seems that by adding (?!^) you can obtained the desired result.

strsplit('AaaBbbCcc', "(?!^)(?=[A-Z])", perl=TRUE)

For the camel case we may do

strsplit('AaaABbbBCcc', '(?!^)(?=\\p{Lu}\\p{Ll})', perl=TRUE)[[1]]
strsplit('AaaABbbBCcc', '(?!^)(?=[A-Z][a-z])', perl=TRUE)[[1]]  ## or
# [1] "AaaA" "BbbB" "Ccc"

CodePudding user response：

This all sounds like a problem of parsing a CamelCase name, which appears to be already solved.

> gsub("([a-z])([A-Z])", replacement = "\\1 \\2", x = 'AaaBbbCcc')
[1] "Aaa Bbb Ccc"