Home > Enterprise >  For loop is stopping
For loop is stopping

Time:09-24

I am trying to create a function that returns the similarity of words, but the loop stops after processing only the first argument! For example, if I execute example.py hello there the program returns this:

hello is close to: 

held, heel, helpt, hele, Hallo, het, helaas, half, helden, heb, veel, Meld, zelf, heeft, beeld, alle, wel, Rel, Geld, cel, geld, Alle, hoezo, 
 there is close to:

Here is my code:

def create_data():
    data =defaultdict(int)
    value = 0
    for line in sys.stdin:
        [ident, user, text, terms] = line.rstrip().split('\t')
        for word in terms.split():
            data[word] = value

    return data

def find_closest(word):

    data = create_data()
    data_with_distance= defaultdict(int)
    for key in data:
        distance = lev_dist(word, key)
        data_with_distance[key] = distance
    return {k: v for k, v in sorted(data_with_distance.items(), key=lambda item: item[1])}


def main():
    if len(sys.argv) > 1:

        for w in sys.argv[1:]:
            print("\n",w, "is close to:\n")
            closest = find_closest(w)
            closest_words = [k for k, v in closest.items() if v < 4]
            #minimal_distance = list(closest.values())[0]
            for close in closest_words:
                print(close, end=", ")

    else:
        sys.stderr.write("no argument\n")

if __name__ == '__main__':
    main()

CodePudding user response:

You need to cache the results of create_data if you want to reuse it:

def find_closest(word, data):  # take data as param here
    data_with_distance= defaultdict(int)
    for key in data:
        distance = lev_dist(word, key)
        data_with_distance[key] = distance
    return {k: v for k, v in sorted(data_with_distance.items(), key=lambda item: item[1])}


def main():
    data = create_data()  # load data from stdin ONCE

    if len(sys.argv) > 1:

        for w in sys.argv[1:]:
            print("\n",w, "is close to:\n")
            closest = find_closest(w, data)  # pass data as param here
            closest_words = [k for k, v in closest.items() if v < 4]
            #minimal_distance = list(closest.values())[0]
            for close in closest_words:
                print(close, end=", ")

Another option would be to stick a caching decorator on create_data:

from functools import cache

@cache
def create_data():
    data = defaultdict(int)
    value = 0
    for line in sys.stdin:
        [ident, user, text, terms] = line.rstrip().split('\t')
        for word in terms.split():
            data[word] = value
    return data

This "fixes" the function by making it cache the result from the first time you run it, and return the same result on subsequent calls instead of actually executing the function.

In a function that takes parameters, the caching would happen based on the parameters; since this function takes no parameters, it'll just cache a single return value. If the function had a desirable side effect you would not want to cache it like this, but in this case the side effect is undesirable so sticking a @cache on it is a very easy solution.

  • Related