how to find most repetitive word in a text file-CodePudding

The code :

import java.io.File;
import java.util.Scanner; 

class Main {
    public static void main(String[] args) throws Exception{
        //code
        int max = 0;
        int count = 0;
        String rep_word = "none";
        File myfile = new File("rough.txt");
        Scanner reader = new Scanner(myfile);
        Scanner sub_reader = new Scanner(myfile);
        while (reader.hasNextLine()) {
            String each_word = reader.next();
            while (sub_reader.hasNextLine()){
                    String check = sub_reader.next();
                    if (check == each_word){
                        count =1;
                    }
            }
            if (max<count){
                max = count;
                rep_word = each_word;
            }
          }
        System.out.println(rep_word);  
        reader.close();
        sub_reader.close();
        
    }
}

the rough.txt file :

I want to return the most repetitive word from the text file without using arrays. I'm not getting the desired output. i found that the if statement is not satisfying even when the variable 'check' and 'each_word' are same, I dont understand where i went wrong.

CodePudding user response：

You should be using a map HashMap to quickly and efficiently count the frequency of each word without repetitive re-readings of the input file with two readers.

To do this, Map::merge method is used, it also returns current frequency of the word, so the max frequency can be tracked immediately.

int max = 0;
int count = 0;
String rep_word = "none";

// use LinkedHashMap to maintain insertion order
Map<String, Integer> freqMap = new LinkedHashMap<>();

// use try-with-resources to automatically close scanner
try (Scanner reader = new Scanner(new File("rough.txt"))) {
    while (reader.hasNext()) {
        String word = reader.next();
        count = freqMap.merge(word, 1, Integer::sum);
        if (count > max) {
            max = count;
            rep_word = word;
        }
    }
}
System.out.println(rep_word   " repeated "   max   " times");

If there are several words with the same frequency, it is easier to find all of them in the map:

for (Map.Entry<String, Integer> entry : freqMap.entrySet()) {
    if (max == entry.getValue()) {
        System.out.println(entry.getKey()   " repeated "   max   " times");  
    }
}

CodePudding user response：

This is untested code compiled from a few sources. It will get the most frequent word and how many times it occurs. I commented a bit so you can follow it better. This solution is assuming your file has only words (no punctuation etc.)

String line, word = "";  
int count = 0, highestCount = 0;  
ArrayList<String> words = new ArrayList<String>();  

BufferedReader br = new BufferedReader(new FileReader("rough.txt"));  

//Reads each line  
while((line = br.readLine()) != null) {  
    String string[] = line.toLowerCase();
     //Add words to arrlist
    for(String s : string){  
        words.add(s);  
    }
}
//Determine the most repeated word in a file  
for(int i = 0; i < words.size(); i  ){  
    count = 1;  
    //Count each word in the file
    for(int j = i 1; j < words.size(); j  ){  
        if(words.get(i).equals(words.get(j))){  
            count  ;  
        }
    }
}
//Update highestCount and corresponding word
if(count > highestCount){  
    highestCount = count;  
    word = words.get(i);  
}

System.out.println("Most repetitive word: "   word   " appeared "   highestCount   " times.");
br.close();

CodePudding user response：

You could use a hashMap to store your text as key-value pair: the key is a word and the value will contain its occurrence, Then get the key of maximum value. Something like the following :

class Main {
    public static void main(String[] args) throws Exception{
        Map<String, Integer> map =  new HashMap<>();
        File myfile = new File("/rough.txt");
        Scanner reader = new Scanner(myfile);
        while (reader.hasNextLine()) {
            Scanner sub_reader = new Scanner(reader.nextLine());
            while (sub_reader.hasNext()){
                String word = sub_reader.next();
                // if the word already exist increment the counter
                if(map.containsKey(word)) map.put(word, map.get(word)   1);
                else map.put(word, 1);
            }
            sub_reader.close();
        }
        // get the key of the max value in the hashmap (java 8 and higher)
        String mostRepeated = map.entrySet().stream().max(Comparator.comparing(Map.Entry::getValue)).get().getKey()
        System.out.println(mostRepeated);
        reader.close();

    }
}