Home > Back-end >  How to split alphabetic strings by dot using regex?
How to split alphabetic strings by dot using regex?

Time:04-10

I'd like to split strings by . if the other characters are alphabetic only and the string doesn't start or end with ..

So the expected result for abc.def.xyz would be [abc,def,xyz].

The following string should be left as they are: abc. xy.a3 1a.ab abc.def,xyz

Basically I'm looking for a more elegant solution to the my current code:

if(canSplit(x)){
   var parts = x.split("\\.");
   ...
}

boolean canSplit(String text) {
    if(text.startsWith(".") || text.endsWith(".")) return false;
    
    for(var s : text.split("\\.")) {
        for(int i = 0; i < s.length(); i  ) {
            if(!Character.isAlphabetic(s.charAt(i))) return false;
        }
    }
    return true;        
}

CodePudding user response:

You may use this regex and grab captured group #1

(?:^(?=\p{L} (?:\.\p{L} ) $)|(?!^)\G\.)(\p{L} )

RegEx Demo

Details:

  • (?=\p{L} (?:\.\p{L} ) $) ensures we have dot separated alphabets only in a line
  • \G asserts position at the end of the previous match or the start of the string for the first match
  • (?!^) ensures that we don't allow \G to match at the start

Java Code:

jshell> String str = "abc.def.xyz";
str ==> "abc.def.xyz"

jshell> String re = "(?:^(?=\\p{L} (?:\\.\\p{L} ) $)|(?!^)\\G\\.)(\\p{L} )";
re ==> "(?:^(?=\\p{L} (?:\\.\\p{L} ) $)|(?!^)\\G\\.)(\\p{L} )"

jshell> Pattern.compile(re, Pattern.MULTILINE).matcher(str).results().flatMap(mr -> IntStream.rangeClosed(1, mr.groupCount()).mapToObj(mr::group)).collect(Collectors.toList());
$6 ==> [abc, def, xyz]

CodePudding user response:

Nothing wrong with your aproach. But if you want to use regex, your canSplit method could look like :

boolean canSplit(String text) {
    String regex = "[a-z] (?:\\.[a-z] ) ";
    return text.matches(regex);
}
  • Related