I'm try to make an inverted index for some NLP to see how many times a word appears in a document. I'm doing this via a dictionary but my output is like this (here the word man appears in documents 1 and 11)
{'man': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11],
'upon': [1, 1, 1, 3, 3, 3, 1539, 1539, 1539]}
How do I get rid of these duplicate keys so I just have
{'man': [1,11], 'upon': [1,3,1539]}
CodePudding user response:
Just convert values to sets and then back to lists:
my_dict = {k: list(set(v)) for k, v in my_dict.items()}