Home > Software engineering >  Trying to delete all non letter parts of a word but this line deletes the whole word '"Hav
Trying to delete all non letter parts of a word but this line deletes the whole word '"Hav

Time:01-27

while(tokenizer.hasMoreTokens()){
currentWord = tokenizer.nextToken();
String[] parts = currentWord.split(Pattern.quote("."));
String[] parts2 = parts[0].split(Pattern.quote(","));
String[] parts3 = parts2[0].split(Pattern.quote("?"));
String[] parts4 = parts3[0].split(Pattern.quote("\\.| "));
String[] parts5 = parts4[0].split("\"");
String[] parts6 = parts5[0].split(Pattern.quote(":"));

System.out.println(Arrays.toString(parts6));

I'm just trying to get this text to split properly, only issue right now is the word:

"Have

Also if someone could provide a solution that combines all this into one line that would be nice but I couldn't get that to work thanks

CodePudding user response:

Try this.

The \ is to escape the ", and the "\\" are to escape the regex special characters "." & "?". We are replacing any of these .,":? with an empty string.

    while(tokenizer.hasMoreTokens()){
        currentWord = tokenizer.nextToken();
        final String cleanWord = currentWord.replaceAll("[\\.,\":\\?]", "");
        System.out.println(cleanWord);
    }

CodePudding user response:

There are specialized classes in the API to parse words out of text. Here is one such:

import java.text.BreakIterator;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Stream;
import java.nio.file.Files;
import java.nio.file.Paths;

public class WordCollector {

    public static void main(String[] args) {
        try {
            List<String> words = WordCollector.getWords(Files.lines(Paths.get(args[0])));
            System.out.println(words);
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

    public static List<String> getWords(Stream<String> lines) {
        List<String> result = new ArrayList<>();
        BreakIterator boundary = BreakIterator.getWordInstance();
        lines.forEach(line -> {
            boundary.setText(line);

            int start = boundary.first();
            for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
                String candidate = line.substring(start, end).replaceAll("\\p{Punct}", "").trim();
                if (candidate.length() > 0) {
                    result.add(candidate);
                }
            }
        });
        return result;
    }
}

CodePudding user response:

Here is one way if you want to split the line on non-letters.

[^A-Za-z] split on one or more of non-letters

String line = "wordA, wordB; wordC;;; wordD, wordE!? - !wordF??, !wordG!, wordH, wordI";
String[] words = line.split("[^A-Za-z] ");
for (String word : words) {
    System.out.println(word);
}

prints

wordA
wordB
wordC
wordD
wordE
wordF
wordG
wordH
wordI

On the other hand, if you want to remove those characters from a word, use a similar pattern. No need to specify the non-letter characters separately.

String word = "C:om!>{}.p*u**te,;rs";
word = word.replaceAll("[^A-Za-z]","");
System.out.println(word);

prints

Computers

CodePudding user response:

The code below shows how you can ignore all none alpha characters.

import java.io.*;
public class Main{
    public static void main(String[] args) throws IOException {
        int c = 0;
        while((c=System.in.read())!=-1)
           if (('a' <= c && c <= 'z') || ('A' <= c && c <='Z')
              System.out.print((char)c);
    }
}
  • Related