I have 2 lists of dictionnaries that result from a pymongo extraction.
A list of dicts containing id's (string) and lemmas (strings):
lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': 'lemma2'}, {'id': 'id3', 'lemma': 'lemma3'}, ...]
A list of dicts containing id's and multiple words per id:
words = [{'id': 'id1', 'word': 'word1.1'}, {'id': 'id1', 'word': 'word1.2'}, {'id': 'id2', 'word': 'word2.1'}, {'id': 'id3', 'word': 'word3.1'}, {'id': 'id3', 'word': 'word3.2'}, ...]
As you can see, the two lists of dictionnaries are of different length, as there are multiple words associated with each id, but only one lemma.
My goal here is to obtain a dictionnary whose key:value pairs correspond to word:lemma values for the words and lemmas that have the same id. This way, i can replace every word for the corresponding lemma in a text that i am analyzing. For example:
word_lemma_dict = {'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3'; 'word3.2': 'lemma3', ...}
Is there a simple way to do this?
The best i could achieve was to use 2 for loops, but it's not very "pythonistic":
id_lemma_dict = {}
word_lemma_dict = {}
for dico in lemmas:
id_lemma_dict[dico['id']] = dico['lemma'] # create id:lemma dict from list of dicts
for dico in words:
word_lemma_dict[dico['word']] = id_lemma_dict[dico['id']]
print(word_lemma_dict)
CodePudding user response:
Here's an option with comprehensions:
lemmas = [{"id": "id1", "lemma":"lemma1"}, {"id": "id2", "lemma":"lemma2"}, {"id": "id3", "lemma": "lemma3"}]
words = [{"id": "id1", "word": "word1.1"}, {"id": "id1", "word": "word1.2"}, {"id": "id2", "word": "word2.1"}, {"id": "id3", "word": "word3.1"}, {"id": "id3", "word": "word3.2"}]
lemmas_dict = {item["id"]: item["lemma"] for item in lemmas}
word_to_lemma = {word['word']: lemmas_dict[word['id']] for word in words}
print(word_to_lemma)
Output:
{'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3', 'word3.2': 'lemma3'}