Home > front end >  How do I remove identical words from two separate text files using HashMaps in Java?
How do I remove identical words from two separate text files using HashMaps in Java?

Time:10-12

I am working on taking out connecting words (conjunctions) from a book file in Java, so only words of substance remain. I am able to read the information of the book.txt file, split it into tokens, remove punctuation, and sort by how many times the words are said. However, I am not able to figure out how to compare this book data with another file of these conjunctions mentioned earlier. I've thought of putting each connecting word into a HashMap, but the number values and words together just don't seem right to me. Does anyone have any recommendations on how to do this efficiently? I am still a beginner programmer :)

Thank you!

CodePudding user response:

If I've understood you correctly, you have a body of text stored in a file from which you'd like to remove 'Conjuction' words. The conjunction words are stored in a separate file.

If what you want is to remove all instances of those Conjunction words from your text, the following code will do that. (assuming you have already loaded in the text and conjunctions into two strings)

//Load text from file into string 'text'
//Load conjuctions from file into string 'conjs'

//Split the conjunctions into separate elements of an array
List<String> conjsArray = Arrays.asList(conjs.split("\\s ")); 

//Iterate over each conjunction word, and remove all instances of it from the text
for (String conjunction : conjsArray){
    text = text.replaceAll(conjunction, "");
}
  • Related