Remove duplicates when adding new data-CodePudding

I'm new in Java I hope someone can help me solve this problem.

So I have a data set that contains a collection of words, and it keeps growing bigger and bigger. I don't want duplicate words, so I'm using this code to check if the words have been added and if not, it will added to a new lists.

for(int i = 0; i < rawWords.size(); i  ) {
   String word = rawWords.get(i);
   if(!words.contains(word)) {
      words.add(word);
      wordsToExport.add(word);
   }
}

But the problem is that when the word increases my program starts to slow down. Is there any solution for this problem? or maybe there is an error in my code?

CodePudding user response：

If your Collection should not contain duplicates then you should specify a Set.

Set<String> wordSet = new HashSet<>(words);
wordSet.addAll(rawWords);

The typical option is a HashSet. This assumes that your data objects have implemented hashCode and their hashCode and equals methods are consistent. Since you are working with String you do not have to do anything since this class obeys those requirements.

If you require some sort of ordering to your Collection then consider TreeSet or LinkedHashSet depending on your use case. Search for information on the Java Collections framework for more details.

CodePudding user response：

You have added your words twice in the words list and wordsToExport, which might slow down things. If you want to avoid duplicated words, you can simply use Set implementation. Example implementation which will print [word1, word3, word2] in the console:

public static void main(String[] args) {
    List<String> rawWords = Arrays.asList("word1", "word2", "word3", "word1", "word2");
    Set<String> words = new HashSet<>();
    Set<String> wordsToExport = new HashSet<>();
    for (String word : rawWords) {
        words.add(word);
        wordsToExport.add(word);
    }
    System.out.println(words);
}