I want to split a sentence and only keeps word in an array.
For example,
"I bought a A7A for $36,000".
After the split, the array should only keep ["I","bought", "a", "for"]
"A7A"
was ignored since it contains numbers.
$36,000
was ignored since it contains special characters
I want to keep only alphabets words only.
I tried use \W
, but it does not give correct result.
CodePudding user response:
I would instead phrase this as a regex find all search on the pattern (?i)\b[A-Z] \b
:
String input = "I bought a A7A for $36,000";
String[] matches = Pattern.compile("(?i)\\b[A-Z] \\b")
.matcher(input)
.results()
.map(MatchResult::group)
.toArray(String[]::new);
System.out.println(Arrays.toString(matches)); // [I, bought, a, for]
CodePudding user response:
The regex pattern for this purpose is (based on this link):
(?<!\S)[A-Za-z] (?!\S)|(?<!\S)[A-Za-z] (?=:(?!\S))
And then you can use this code for your need:
public static void main(String[] args) {
List<String> result = new ArrayList<>();
String regex = "(?<!\\S)[A-Za-z] (?!\\S)|(?<!\\S)[A-Za-z] (?=:(?!\\S))";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher("I bought a A7A for $36,000");
while (matcher.find()) {
result.add(matcher.group());
}
System.out.println(result.toString());
}
You can edit it as your requirement, for example case-sensitive/insensitive or ...
CodePudding user response:
Works for:
There are already some answers here but this is the one I'd preferred to do that. Try this code:
List<String> wordsOnlyList = new ArrayList<>(); // This list contains all the words without numbers or special chars
String sentence = "I bought a A7A for $36,000"; // This is a sample sentence to test the result
String[] words = sentence.split(" "); // split into each word
for(String word : words){
Pattern p = Pattern.compile("[^a-z ]", Pattern.CASE_INSENSITIVE); // this is the pattern to check if a string has only alphabets
Matcher m = p.matcher(word); // check if it matches
boolean b = m.find(); // boolean to prove it has only alphabets or not
if (!b) { //checking if it has only alphabets
// it has only alphabets
wordsOnlyList.add(word); // add the word to the list
}else{
// it does not have only alphabets
}
}
String[] wordsArray = wordsOnlyList.toArray(new String[0]); // convert to array to display in console
System.out.println(Arrays.toString(wordsArray)); // display in console
Result:
[I, bought, a, for]
In the end, my class looks like this:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyClass {
public static void main(String[] args) {
List<String> wordsOnlyList = new ArrayList<>(); // This list contains all the words without numbers or special chars
String sentence = "I bought a A7A for $36,000"; // This is a sample sentence to test the result
String[] words = sentence.split(" "); // split into each word
for(String word : words){
Pattern p = Pattern.compile("[^a-z ]", Pattern.CASE_INSENSITIVE); // this is the pattern to check if a string has only alphabets
Matcher m = p.matcher(word); // check if it matches
boolean b = m.find(); // boolean to prove it has only alphabets or not
if (!b) { //checking if it has only alphabets
// it has only alphabets
wordsOnlyList.add(word); // add the word to the list
}else{
// it does not have only alphabets
}
}
String[] wordsArray = wordsOnlyList.toArray(new String[0]); // convert to array to display in console
System.out.println(Arrays.toString(wordsArray)); // display in console
}
}
You can also test the code from here.