I have a list of lists and I'm trying to use Counter to get the number of unique words across all of the lists.
[[‘My’,
‘name’,
‘is’,
‘Joe’],
[‘My’,
‘name’,
‘is’,
‘Sally’],
[‘My’,
‘name’,
‘is’,
‘Mike’]]
If it were just the first list I think I could do this:
counter_object = Counter(my_list[0])
keys = counter_object.keys()
num_values = len(keys)
print(num_values)
But unsure about doing this for multiple. Any help is much appreciated, thanks.
Edit: The expected output is 6. Because unique words ‘My’, ‘name’, ‘is’, 'Joe', 'Sally', 'Mike' total to 6.
CodePudding user response:
If I understand your question correctly, you want to count the unique items from each sublists.
# MM = is your list
from collections import Counter
def count_unique(M):
flats = [x for sub in M for x in sub]
counts = Counter(flats)
return len(counts.keys())
print(count_unique(MM)) # check it
# 6
CodePudding user response:
Use chain.from_iterable
to flatten the list, then use Counter
:
from collections import Counter
from itertools import chain
data = [
["My", "name", "is", "Joe"],
["My", "name", "is", "Sally"],
["My", "name", "is", "Mike"],
]
counts = Counter(chain.from_iterable(data))
print(counts)
Output
Counter({'My': 3, 'name': 3, 'is': 3, 'Joe': 1, 'Sally': 1, 'Mike': 1})
For more on how to flatten lists of lists, see the this.
If you want the total of unique keys, on top of the counts, just do:
res = len(counts)
Note that if you only care about the total of uniques, you can directly use a set:
counts = set(chain.from_iterable(data))
print(counts)
Output
{'Sally', 'Mike', 'My', 'name', 'is', 'Joe'}