Program gets an input at the beginning. That inputed string can contain capital letters or any other ascii letters. We don't difference between them, so we just use lower() method. Also any letters other than letters from alphabet (numbers etc.) are used as spaces between strings. Function is supposed to analyse the input, sort it and count it. Output:
{'idk': 2, 'idc': 1, 'idf': 1}
Input:
print(word_frequency("Idk, Idc, Idk, Idf"))
I tried this and It's sorting the input, but I can't find a way to separate strings. This is what I did:
def word_frequency(text):
f = {}
pendens = ""
for s in text:
if s.isalpha():
pendens = s
else:
pendens = pendens.lower()
if pendens != " ":
if f.get(pendens, -1) != -1:
f[pendens] = 1
else:
f[pendens] = 1
pendens = pendens.lower()
if pendens != " ":
if f.get(pendens, -1) != -1:
f[pendens] = 1
else:
f[pendens] = 1
return f
print(word_frequency("Idk, Idc, Idk, Idf"))
print(word_frequency("Idk, Idc,Idk;;-;Idf"))
print(word_frequency("help me please"))
I'm trying to get better at coding so any form of help will be appreciated :)
CodePudding user response:
The easiest solution would involve regex and Counter, which is a type of dictionary specifically tailored to counting occurrences of values like this:
>>> import re
>>> from collections import Counter
>>> words = 'Idk, Idc, Idk, Idf'
>>> re.findall('[a-z] ', words.lower())
['idk', 'idc', 'idk', 'idf']
>>> Counter(re.findall('[a-z] ', words.lower()))
Counter({'idk': 2, 'idc': 1, 'idf': 1})
If you cannot use Counter, then a plain dictionary would also work. We can use dict.get
to handle words that both are and are not in the dict yet:
def count_words(words):
counts = {}
for word in re.findall('[a-z] ', words.lower()):
counts[word] = counts.get(word, 0) 1
return counts
Results in:
>>> count_words('Idk, Idc, Idk, Idf')
{'idk': 2, 'idc': 1, 'idf': 1}
If you cannot use regex, then the problem becomes more complicated, but still doable. A generator like the following would work:
def split_words(words):
word = ''
for c in words.lower():
if 97 <= ord(c) <= 122: # ord('a') thru ord('z')
word = c
elif word:
yield word
word = ''
if word:
yield word
def count_words(words):
counts = {}
for word in split_words(words):
counts[word] = counts.get(word, 0) 1
return counts
Results in:
>>> count_words('Idk, Idc, Idk, Idf')
{'idk': 2, 'idc': 1, 'idf': 1}