Home > Mobile >  java split string by using regex
java split string by using regex

Time:04-24

I want to split a sentence and only keeps word in an array.

For example,

"I bought a A7A for $36,000".

After the split, the array should only keep ["I","bought", "a", "for"]

"A7A" was ignored since it contains numbers.

$36,000 was ignored since it contains special characters

I want to keep only alphabets words only.

I tried use \W, but it does not give correct result.

CodePudding user response:

I would instead phrase this as a regex find all search on the pattern (?i)\b[A-Z] \b:

String input = "I bought a A7A for $36,000";
String[] matches = Pattern.compile("(?i)\\b[A-Z] \\b")
    .matcher(input)
    .results()
    .map(MatchResult::group)
    .toArray(String[]::new);
System.out.println(Arrays.toString(matches));  // [I, bought, a, for]

CodePudding user response:

The regex pattern for this purpose is (based on this link):

(?<!\S)[A-Za-z] (?!\S)|(?<!\S)[A-Za-z] (?=:(?!\S)) 

And then you can use this code for your need:

public static void main(String[] args) {

    List<String> result = new ArrayList<>();
     
    String regex = "(?<!\\S)[A-Za-z] (?!\\S)|(?<!\\S)[A-Za-z] (?=:(?!\\S))";

    Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    Matcher matcher = pattern.matcher("I bought a A7A for $36,000");
    while (matcher.find()) {
        result.add(matcher.group());
    }

    System.out.println(result.toString());
}

You can edit it as your requirement, for example case-sensitive/insensitive or ...

CodePudding user response:

Works for:


There are already some answers here but this is the one I'd preferred to do that. Try this code:

List<String> wordsOnlyList = new ArrayList<>(); // This list contains all the words without numbers or special chars
String sentence = "I bought a A7A for $36,000"; // This is a sample sentence to test the result
String[] words = sentence.split(" "); // split into each word

for(String word : words){
   Pattern p = Pattern.compile("[^a-z ]", Pattern.CASE_INSENSITIVE); // this is the pattern to check if a string has only alphabets
   Matcher m = p.matcher(word); // check if it matches
   boolean b = m.find(); // boolean to prove it has only alphabets or not

   if (!b) { //checking if it has only alphabets
      // it has only alphabets
      wordsOnlyList.add(word); // add the word to the list
   }else{
      // it does not have only alphabets
   }
}

String[] wordsArray = wordsOnlyList.toArray(new String[0]); // convert to array to display in console
System.out.println(Arrays.toString(wordsArray)); // display in console

Result:

[I, bought, a, for]

In the end, my class looks like this:

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MyClass {
    public static void main(String[] args) {
        List<String> wordsOnlyList = new ArrayList<>(); // This list contains all the words without numbers or special chars
        String sentence = "I bought a A7A for $36,000"; // This is a sample sentence to test the result
        String[] words = sentence.split(" "); // split into each word

        for(String word : words){
           Pattern p = Pattern.compile("[^a-z ]", Pattern.CASE_INSENSITIVE); // this is the pattern to check if a string has only alphabets
           Matcher m = p.matcher(word); // check if it matches
           boolean b = m.find(); // boolean to prove it has only alphabets or not

           if (!b) { //checking if it has only alphabets
              // it has only alphabets
              wordsOnlyList.add(word); // add the word to the list
           }else{
              // it does not have only alphabets
           }
        }

        String[] wordsArray = wordsOnlyList.toArray(new String[0]); // convert to array to display in console
        System.out.println(Arrays.toString(wordsArray)); // display in console
    }
    
}

You can also test the code from here.

  • Related