I have this code which runs for counting the number of words in the string:
s = "Python is great but Java is also great"
f_s = s.split()
for word in f_s:
str_f = s.count(word)
print('There are' , str_f , '[',word,'] from ' , s)
And the output is
There are 1 [ Python ] from Python is great but Java is also great
There are 2 [ is ] from Python is great but Java is also great
There are 2 [ great ] from Python is great but Java is also great
There are 1 [ but ] from Python is great but Java is also great
There are 1 [ Java ] from Python is great but Java is also great
There are 2 [ is ] from Python is great but Java is also great
There are 1 [ also ] from Python is great but Java is also great
There are 2 [ great ] from Python is great but Java is also great
The for loop goes over every word, however, I want to skip counting the duplicates ("is" and "great") so they only count it once, but I can't figure out which condition for If I should do. Any help is appreciated!
CodePudding user response:
It would be better to use a single pass over the terms with a Counter first:
>>> from collections import Counter
>>> counter = Counter("Python is great but Java is also great".split())
>>> for word, count in counter.items():
... print(word, count)
Python 1
is 2
great 2
but 1
Java 1
also 1
Order will be preserved since a Counter is a dict, and dict is order preserving.
The reason this is better is that using s.count(word)
for each word is looking like O(n^2) complexity, which is not good.
CodePudding user response:
Without any other libraries:
counter = {}
for word in f_s:
counter.setdefault(word, 0)
counter[word] = 1
print(counter)
# Output
{'Python': 1, 'is': 2, 'great': 2, 'but': 1, 'Java': 1, 'also': 1}