Home > Back-end >  How to create a dictionnary whose key:value pairs are the values of two different lists of dictionna
How to create a dictionnary whose key:value pairs are the values of two different lists of dictionna

Time:07-08

I have 2 lists of dictionnaries that result from a pymongo extraction.

A list of dicts containing id's (string) and lemmas (strings):

lemmas = [{'id': 'id1', 'lemma': 'lemma1'}, {'id': 'id2', 'lemma': 'lemma2'}, {'id': 'id3', 'lemma': 'lemma3'}, ...]

A list of dicts containing id's and multiple words per id:

words = [{'id': 'id1', 'word': 'word1.1'}, {'id': 'id1', 'word': 'word1.2'}, {'id': 'id2', 'word': 'word2.1'}, {'id': 'id3', 'word': 'word3.1'}, {'id': 'id3', 'word': 'word3.2'}, ...]

As you can see, the two lists of dictionnaries are of different length, as there are multiple words associated with each id, but only one lemma.

My goal here is to obtain a dictionnary whose key:value pairs correspond to word:lemma values for the words and lemmas that have the same id. This way, i can replace every word for the corresponding lemma in a text that i am analyzing. For example:

word_lemma_dict = {'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3'; 'word3.2': 'lemma3', ...}

Is there a simple way to do this?

The best i could achieve was to use 2 for loops, but it's not very "pythonistic":

id_lemma_dict = {}
word_lemma_dict = {}

for dico in lemmas:
    id_lemma_dict[dico['id']] = dico['lemma']  # create id:lemma dict from list of dicts

for dico in words:
    word_lemma_dict[dico['word']] = id_lemma_dict[dico['id']]

print(word_lemma_dict)

CodePudding user response:

Here's an option with comprehensions:

lemmas = [{"id": "id1", "lemma":"lemma1"}, {"id": "id2", "lemma":"lemma2"}, {"id": "id3", "lemma": "lemma3"}]
words = [{"id": "id1", "word": "word1.1"}, {"id": "id1", "word": "word1.2"}, {"id": "id2", "word": "word2.1"}, {"id": "id3", "word": "word3.1"}, {"id": "id3", "word": "word3.2"}]

lemmas_dict = {item["id"]: item["lemma"] for item in lemmas}
word_to_lemma = {word['word']: lemmas_dict[word['id']] for word in words}

print(word_to_lemma)

Output:

{'word1.1': 'lemma1', 'word1.2': 'lemma1', 'word2.1': 'lemma2', 'word3.1': 'lemma3', 'word3.2': 'lemma3'}
  • Related