I have a dictionary of tokenised sentences much like this example below:
dict1 = {key1: [["first", "sentence"], ["second", "sentence"]],
key2: [["dog", "in", "a", "park"], ["Batman", "is", "scary"]]}
I want to concatenate every word in the dictionary into one single list.
list1 = ["first", "sentence", "second", "sentence", "dog", "in" "a", "park", "Batman", "is",
"scary"]
I have tried to do write this code, but it returned a ValueError:
all_words = ", ".join(((word for word in sentence) for sentence in dict1[key]) for key in dict1.keys())
TypeError: sequence item 0: expected str instance, generator found
How can I improve my code so that it does what I want?
CodePudding user response:
You have doubly nested list, one approach is to use itertools.chain
:
from itertools import chain
dict1 = {"key1": [["first", "sentence"], ["second", "sentence"]],
"key2": [["dog", "in", "a", "park"], ["Batman", "is", "scary"]]}
res = list(chain.from_iterable(chain(*dict1.values())))
print(res)
Output
['first', 'sentence', 'second', 'sentence', 'dog', 'in', 'a', 'park', 'Batman', 'is', 'scary']
Alternative, without itertools.chain
:
res = [vi for vs in dict1.values() for v in vs for vi in v]
print(res)
CodePudding user response:
You got the placement of for loops wrong. The first for loop should be the one which should be parsed first and so on.
", ".join(word for key in dict1.keys() for sentence in dict1[key] for word in sentence )