I have strings containing camel case text and numbers and would like to split it.
E.g. the string "abcDefGhi345J6"
should be split into
["abc", "Def", "Ghi", "345", "J", "6"]
My best effort is
"abcDefGhi345J6".split("(?=\\p{Lu})|(?!\\p{Lu})(?=\\d )")
which gives me
["abc", "Def", "Ghi", "3", "4", "5", "J", "6"]
PS: Dupe marked answers are NOT giving expected output as those are are not Unicode agnostic.
CodePudding user response:
You may use this regex for splitting:
(?=\p{Lu})|(?<!\d)(?=\d)
For Java code:
String[] arr = string.split("(?=\\p{Lu})|(?<!\\d)(?=\\d)");
(?<!\d)(?=\d)
will find a position that has a digit ahead but there is no digit behind that position.