Home > Blockchain >  Java regex to capture all words in strings
Java regex to capture all words in strings

Time:09-16

I am looking for a regex able to capture all the words in a string.

I have below input strings

  1. JOHN SMITH MR
  2. JOHN MR
  3. J MISS

Expected output

  1. {"JOHN", "SMITH", "MR"}
  2. {"JOHN", "MR"}
  3. {"J", "MISS"}

I have written the regex below which is working perfectly fine but it is not working for input string 2 and 3. The input string should only have alphabetical characters (no numbers or special characters).

((?:[a-z]*[a-z] )).*?((?:[a-z][a-z] )).*?((?:[a-z][a-z] ))

If the input string contains numbers like JOHN 12345 then the regex should not capture anything.

Could you please help me to improve my regex to capture the expected result?

CodePudding user response:

can i suggest alternative? use string split() method together with regex. See below

        String[] arr = {"JOHN SMITH MR",
                "JOHN MR",
                "J MISS",
        "JOHN 1234"};

        for (String s : arr) {
            Matcher m = Pattern.compile("[0-9] ").matcher(s);
            if(m.find()){
                continue;
            }
            String[] arr2 = s.split(" ");
            log.info("arr2 = {}", Arrays.asList(arr2));
        }

There you have your output

CodePudding user response:

Have an existing code the extract the first name and Title form the input rawNameString. The code is working perfectly if the input string contain only first-Name and Title like( "JOHN MR") but it will not work if theinput string contain first-Name, middle name and Title like ("JOHN SMITH MR"). In that the case Title is getting mapped with incorrect value. The Goal is to extract only first-Name and title only.

Existing Code:

Pattern  NAME_PATTERN = Pattern.compile("((?:[a-z]*[a-z] ))"   ".*?"   "((?:[a-z][a-z] ))", Pattern.CASE_INSENSITIVE);
        String rawNameString = "JOHN MR";

        Name name = new Name();
        Matcher nameMatcher = NAME_PATTERN.matcher(rawNameString);
        if (nameMatcher.find()) {
            //tile
            name.setTitle(nameMatcher.group(2));
            //firstname
            name.setFirstName(nameMatcher.group(1));
        } else {
            throw new Exception("Invalid Name");
        }

I have modified the code to capture the first name and Title irrespective to number of words in input string.

Pattern  NAME_PATTERN = Pattern.compile("^[a-zA-Z ]*$");
        String rawNameString = "JOHN SMITH MR";

        Name name = new Name();
        Matcher matcher = NAME_PATTERN.matcher(rawNameString);
        if(matcher.find()) {
            List<String> nameList = Arrays.asList(rawNameString.split("\\s "));
            if(nameList.isEmpty())
                throw new Exception("Invalid Name ");
            name.setFirstName(nameList.get(0));
            name.setTitle(nameList.get(nameList.size()-1));
        } else {
            throw new Exception("Invalid Name ");
        }

I was looking to change the regex instead of changing the Java code.

  • Related