Sorting keys in a map-CodePudding

Program gets an input at the beginning. That inputed string can contain capital letters or any other ascii letters. We don't difference between them, so we just use lower() method. Also any letters other than letters from alphabet (numbers etc.) are used as spaces between strings. Function is supposed to analyse the input, sort it and count it. Output:

{'idk': 2, 'idc': 1, 'idf': 1}

Input:

print(word_frequency("Idk, Idc, Idk, Idf"))

I tried this and It's sorting the input, but I can't find a way to separate strings. This is what I did:


def word_frequency(text):
    f = {}
    pendens = ""

    for s in text:
        if s.isalpha():
            pendens  = s
        else:
            pendens = pendens.lower()
            if pendens != " ":
                if f.get(pendens, -1) != -1:
                    f[pendens]  = 1
                else:
                    f[pendens] = 1
    pendens = pendens.lower()
    if pendens != " ": 
        if f.get(pendens, -1) != -1:
            f[pendens]  = 1
        else:
            f[pendens] = 1
    return f

print(word_frequency("Idk, Idc, Idk, Idf"))       
print(word_frequency("Idk, Idc,Idk;;-;Idf"))     
print(word_frequency("help me please"))

I'm trying to get better at coding so any form of help will be appreciated :)

CodePudding user response：

The easiest solution would involve regex and Counter, which is a type of dictionary specifically tailored to counting occurrences of values like this:

>>> import re
>>> from collections import Counter

>>> words = 'Idk, Idc, Idk, Idf'

>>> re.findall('[a-z] ', words.lower())
['idk', 'idc', 'idk', 'idf']

>>> Counter(re.findall('[a-z] ', words.lower()))
Counter({'idk': 2, 'idc': 1, 'idf': 1})

If you cannot use Counter, then a plain dictionary would also work. We can use dict.get to handle words that both are and are not in the dict yet:

def count_words(words):
    counts = {}
    for word in re.findall('[a-z] ', words.lower()):
        counts[word] = counts.get(word, 0)   1
    return counts

Results in:

>>> count_words('Idk, Idc, Idk, Idf')
{'idk': 2, 'idc': 1, 'idf': 1}

If you cannot use regex, then the problem becomes more complicated, but still doable. A generator like the following would work:

def split_words(words):
    word = ''
    for c in words.lower():
        if 97 <= ord(c) <= 122:  # ord('a') thru ord('z')
            word  = c
        elif word:
            yield word
            word = ''
    if word:
        yield word


def count_words(words):
    counts = {}
    for word in split_words(words):
        counts[word] = counts.get(word, 0)   1
    return counts

Results in:

>>> count_words('Idk, Idc, Idk, Idf')
{'idk': 2, 'idc': 1, 'idf': 1}