Home > Enterprise >  Counting number of appearances of term in each value of a dictionary
Counting number of appearances of term in each value of a dictionary

Time:11-17

I have a dictionary where a key is an id and the value is a list of strings.

I am interested in creating a subsequent dictionary from this, to store the frequency of each string. In this new dictionary, the key would be a word, and the value is the number of lists it appeared in, in the original dictionary.

freq_dict = {}
for key, value in dict.items():
   if word not in freq_dict:
        freq_dict[word] = 0
        freq_dict[word]  = 1
        continue
   else:
        freq_dict[word]  = 1
        continue

One issue I am having here is that if a word appears twice in a list, it will be counted twice. To fix this, I tried using break instead of continue, but then I would never count more than one word in each list.

What would be a good and efficient approach of achieving what I want? I though of converting every value to a set in the original dict, but that seems unreasonable for very large dictionaries.

CodePudding user response:

You can do the following:

freq_dict = {}
for value in dct.values():  # don't call a variable dict
    for word in value:
        freq_dict[word] = freq_dict.get(word, 0)   1

And if you don't want to count lists twice for words that may occur twice in them, just change the inner loop to:

for word in set(value):

Of course there are utils to make this easier:

from collections import Counter
from itertools import chain

freq_dict = Counter(chain(*dct.values()))
# freq_dict = Counter(chain(*map(set, dct.values())))

CodePudding user response:

You could use list and dictionary comprehension. Here an example for the following dct1:

dct1 = {1: ["Bla", "Foo", "Foo", "Baz"], 2: ["Bla"], 3: ["Foo", "Baz"], 4: ["Baz"], 5: ["Foo"]}

When using the following code you get the result you want:

#this flattens the dictionary to a one-level list where duplicates within one value list from dct1 are excluded
values_list = [entry for inner_list in [list(set(ls)) for ls in dct1.values()] for entry in inner_list]
#then you use a dictionary comprehension with the count function
dct2 = {string: values_list.count(string) for string in values_list}

dct2 then give you the following:

{'Foo': 3, 'Bla': 2, 'Baz': 3}
  • Related