Home > front end >  Skip duplicate words in counting a string
Skip duplicate words in counting a string

Time:03-16

I have this code which runs for counting the number of words in the string:

s = "Python is great but Java is also great"
f_s = s.split()
for word in f_s:
    str_f = s.count(word)
    print('There are' , str_f , '[',word,'] from ' , s)

And the output is

There are 1 [ Python ] from  Python is great but Java is also great
There are 2 [ is ] from  Python is great but Java is also great
There are 2 [ great ] from  Python is great but Java is also great
There are 1 [ but ] from  Python is great but Java is also great
There are 1 [ Java ] from  Python is great but Java is also great
There are 2 [ is ] from  Python is great but Java is also great
There are 1 [ also ] from  Python is great but Java is also great
There are 2 [ great ] from  Python is great but Java is also great 

The for loop goes over every word, however, I want to skip counting the duplicates ("is" and "great") so they only count it once, but I can't figure out which condition for If I should do. Any help is appreciated!

CodePudding user response:

It would be better to use a single pass over the terms with a Counter first:

>>> from collections import Counter
>>> counter = Counter("Python is great but Java is also great".split())
>>> for word, count in counter.items():
...     print(word, count)
Python 1
is 2
great 2
but 1
Java 1
also 1

Order will be preserved since a Counter is a dict, and dict is order preserving.

The reason this is better is that using s.count(word) for each word is looking like O(n^2) complexity, which is not good.

CodePudding user response:

Without any other libraries:

counter = {}
for word in f_s:
    counter.setdefault(word, 0)
    counter[word]  = 1
print(counter)

# Output
{'Python': 1, 'is': 2, 'great': 2, 'but': 1, 'Java': 1, 'also': 1}
  • Related