I have a dictionary where a key is an id and the value is a list of strings.
I am interested in creating a subsequent dictionary from this, to store the frequency of each string. In this new dictionary, the key would be a word, and the value is the number of lists it appeared in, in the original dictionary.
freq_dict = {}
for key, value in dict.items():
if word not in freq_dict:
freq_dict[word] = 0
freq_dict[word] = 1
continue
else:
freq_dict[word] = 1
continue
One issue I am having here is that if a word appears twice in a list, it will be counted twice. To fix this, I tried using break
instead of continue
, but then I would never count more than one word in each list.
What would be a good and efficient approach of achieving what I want? I though of converting every value to a set
in the original dict, but that seems unreasonable for very large dictionaries.
CodePudding user response:
You can do the following:
freq_dict = {}
for value in dct.values(): # don't call a variable dict
for word in value:
freq_dict[word] = freq_dict.get(word, 0) 1
And if you don't want to count lists twice for words that may occur twice in them, just change the inner loop to:
for word in set(value):
Of course there are utils to make this easier:
from collections import Counter
from itertools import chain
freq_dict = Counter(chain(*dct.values()))
# freq_dict = Counter(chain(*map(set, dct.values())))
CodePudding user response:
You could use list and dictionary comprehension. Here an example for the following dct1:
dct1 = {1: ["Bla", "Foo", "Foo", "Baz"], 2: ["Bla"], 3: ["Foo", "Baz"], 4: ["Baz"], 5: ["Foo"]}
When using the following code you get the result you want:
#this flattens the dictionary to a one-level list where duplicates within one value list from dct1 are excluded
values_list = [entry for inner_list in [list(set(ls)) for ls in dct1.values()] for entry in inner_list]
#then you use a dictionary comprehension with the count function
dct2 = {string: values_list.count(string) for string in values_list}
dct2 then give you the following:
{'Foo': 3, 'Bla': 2, 'Baz': 3}