How do I make my code differentiate between words and singular characters? (Python)-CodePudding

(Python) My task is to create a program that gathers an input() and puts it into a dictionary. For each word of the text it counts the number of its occurrences before it. My code:

text = input()

words = {}

for word in text:
    if word not in words:
        words[word] = 0
        print(words[word])

    elif word in words:
        words[word] = words[word]   1
        print(words[word])

An example input could be:

one two one two three two four three

The correct output should be:

My code however counts the occurrence of every character, instead of every word making the output way too long. How do I make it differentiate between word and character?

CodePudding user response：

That is because text is a string and iterating over a string iterates through characters. You can use for word in text.split(), this will split the string into a list. By default, it does the split on whitespaces, so it will split it into a list of words here.

CodePudding user response：

Given your example input, you would need to split text on whitespace in order to get words. In general, the problem of splitting arbitrary text into words/tokens is non-trivial; there are a lot of natural language processing libraries purpose built for this.

Also, for counting things, the Counter class from the built-in collections module is very useful.

from collections import Counter

text = input()
word_counts = Counter(w for w in text.split())
print(word_counts.most_common())

Output

[('two', 3), ('one', 2), ('three', 2), ('four', 1)]

CodePudding user response：

You are looking for the function split from the String type: https://docs.python.org/3/library/stdtypes.html?highlight=str split#str.split

Use it to create an array of words:

splitted_text = text.split()

The full example will look like:

text = 'this is an example and this is nice'

splitted_text = text.split()

words = {}

for word in splitted_text:
   if word not in words:
      words[word] = 0
    
   elif word in words:
      words[word] = words[word]   1
print(words)

Which will output:

{'this': 1, 'is': 1, 'an': 0, 'example': 0, 'and': 0, 'nice': 0}