I am trying to create a function that returns the similarity of words, but the loop stops after processing only the first argument! For example, if I execute example.py hello there
the program returns this:
hello is close to:
held, heel, helpt, hele, Hallo, het, helaas, half, helden, heb, veel, Meld, zelf, heeft, beeld, alle, wel, Rel, Geld, cel, geld, Alle, hoezo,
there is close to:
Here is my code:
def create_data():
data =defaultdict(int)
value = 0
for line in sys.stdin:
[ident, user, text, terms] = line.rstrip().split('\t')
for word in terms.split():
data[word] = value
return data
def find_closest(word):
data = create_data()
data_with_distance= defaultdict(int)
for key in data:
distance = lev_dist(word, key)
data_with_distance[key] = distance
return {k: v for k, v in sorted(data_with_distance.items(), key=lambda item: item[1])}
def main():
if len(sys.argv) > 1:
for w in sys.argv[1:]:
print("\n",w, "is close to:\n")
closest = find_closest(w)
closest_words = [k for k, v in closest.items() if v < 4]
#minimal_distance = list(closest.values())[0]
for close in closest_words:
print(close, end=", ")
else:
sys.stderr.write("no argument\n")
if __name__ == '__main__':
main()
CodePudding user response:
You need to cache the results of create_data
if you want to reuse it:
def find_closest(word, data): # take data as param here
data_with_distance= defaultdict(int)
for key in data:
distance = lev_dist(word, key)
data_with_distance[key] = distance
return {k: v for k, v in sorted(data_with_distance.items(), key=lambda item: item[1])}
def main():
data = create_data() # load data from stdin ONCE
if len(sys.argv) > 1:
for w in sys.argv[1:]:
print("\n",w, "is close to:\n")
closest = find_closest(w, data) # pass data as param here
closest_words = [k for k, v in closest.items() if v < 4]
#minimal_distance = list(closest.values())[0]
for close in closest_words:
print(close, end=", ")
Another option would be to stick a caching decorator on create_data
:
from functools import cache
@cache
def create_data():
data = defaultdict(int)
value = 0
for line in sys.stdin:
[ident, user, text, terms] = line.rstrip().split('\t')
for word in terms.split():
data[word] = value
return data
This "fixes" the function by making it cache the result from the first time you run it, and return the same result on subsequent calls instead of actually executing the function.
In a function that takes parameters, the caching would happen based on the parameters; since this function takes no parameters, it'll just cache a single return value. If the function had a desirable side effect you would not want to cache it like this, but in this case the side effect is undesirable so sticking a @cache
on it is a very easy solution.