turning lists into dictionary for frequencies (Python)-CodePudding

I have been given two lists (wofo and role). One is just words, and the other one tells you if the word has been used as a metaphor in a text or not. The first few entries look like this, if you print them:

wofo: Lichtfiguren nachhängt diese Sterne fallen

role: Metapher Anker Anker Metapher Metapher

(so the words don't have quotation marks - are they still strings?)

I am supposed to create two dictionaries. The first one should give the number of times a word has been used as a "Metapher", the other one how often a word has been used as an "Anker".

My first attempt was this (I basically tried to put both list into one dictionary and then call the elements):

lemma_met = {}
lemma_ank = {}

wofo2 = wofo.split("\n")
role2 = role.split("\n")

dicitionary = dict(zip(wofo2, role2))

for elment in wofo:
    if dicitionary.get(element) == "Metapher":
        if element in lemma_met:
            lemma_met[element]  = 1
        else:
            lemma_met[element] = 1
    elif dicitionary.get(element) == "Anker":
        if element in lemma_ank:
            lemma_ank[element]  = 1
        else:
            lemma_ank[element] = 1

I have also tried this but both only give me empty dicitionary entries:

lemma_met = {}
lemma_ank = {}

for key in wofo:
    for value in role:
        if value == "Metapher":
            if key in lemma_met:
                lemma_met[key]  = 1
            else:
                lemma_met[key] = 1
        elif value == "Anker":
            if key in lemma_ank:
                lemma_ank[key]  = 1
            else:
                lemma_ank[key] = 1

Does anyone have an explanation for what I am doing wrong and how I can fix it? Sorry, I am very new to python and writing code in general. Is there an easier way of doing it? Or is there a specific term I can google to find the answer?

Sorry, this question has probably been answered somewhere, but I cannot find it. I have been working on this problem for quite a while now.

Thank you in advance!

CodePudding user response：

This might likely work better for you:

lemma_met = {}
lemma_ank = {}

wofo2 = wofo.split("\n") # these may not be separating your strings into list
role2 = role.split("\n")
#wofo2 = wofo.split(" ") # these might likely be splitting your string into a list form which will work with your output.
#wofo2 = wofo.split(" ")

for wofo_elm, role_elm in zip(wofo2, role2): # allows you to go through both lists at the same time
    if role_elm == "Metapher": # instead look at the current role element
        if wofo_elm in lemma_met:
            lemma_met[wofo_elm]  = 1 # check if the word is used in the dictionary
        else:
            lemma_met[wofo_elm] = 1
    elif role_elm == "Anker":
        if wofo_elm in lemma_ank:
            lemma_ank[wofo_elm]  = 1
        else:
            lemma_ank[wofo_elm] = 1

CodePudding user response：

The reason your code is completely broken is that you made keys from wofo2 (a list of strings, from what I assume is a newline separated string of words in wofo), but your loop loops over wofo itself. It's legal to loop over a string, but it's going to iterate character by character, not by newline separated words, so unless the words are length one, the dict keys will never overlap. You want to loop over wofo2 here, which will also let you guarantee the keys exist (using .get() was silencing errors here, if you'd used square bracket lookup, you'd have realized the keys you were looking for didn't exist, and were one character long).

So assuming the logic is correct, the fix for your original code is to just change:

for element in wofo:

to:

for element in wofo2:

And as a minor bonus, you can change every use of dicitionary.get(element) to dicitionary[element] (because now every key you're checking is guaranteed to be in there, and .get's feature of returning None when it's missing is unnecessary).

And additional possible problem is that it seems you want to track when the same word from wofo2 is seen multiple times, and track all the corresponding values from role2. The dict(zip(...)) code used just keeps the last mapping seen, not all of them. If you want to keep all of them, you'd need something like:

from collections import Counter, defaultdict  # Better versions of dict for what you're doing

lemma_met = Counter()  # Avoids need to test before each increment
lemma_ank = Counter()

wofo2 = wofo.split("\n")
role2 = role.split("\n")

dicitionary = defaultdict(list)  # Use defaultdict(list) if you want to keep all duplicate mappings of the same pair of values, but if you don't care, set has cheaper lookup
                                 # defaultdict(Counter) would be good if you needed to know about duplicate mappings, but not the precise order the duplicates arrived in
for w, r in zip(wofo2, role2):
    dicitionary[w].add(r)  # Change to .append(r) if using defaultdict(list), or to dicitionary[w][r]  = 1 if using defaultdict(Counter)

for element in wofo2:
    if "Metapher" in dicitionary[element]:
        lemma_met[element]  = 1  # No need for conditional check; Counter treats missing key as initially zero
    elif "Anker" in dicitionary[element]:
        lemma_ank[element]  = 1

It's also wholly possible that last loop should really not be using wofo2 at all, especially if you just want to count how many times a word from wofo2 mapped to Metapher or Anker; if that's the case, defaultdict(Counter) is the best solution, and the final loop gets really simple:

from collections import Counter, defaultdict  # Better versions of dict for what you're doing

lemma_met = {}
lemma_ank = {}

wofo2 = wofo.split("\n")
role2 = role.split("\n")

dicitionary = defaultdict(Counter)
for w, r in zip(wofo2, role2):
    dicitionary[w][r]  = 1

for element, rolecnts in dicitionary.items():
    if "Metapher" in rolecnts:
        lemma_met[element] = rolecnts["Metapher"]
    if "Anker" in rolecnts:
        lemma_ank[element] = rolecnts["Anker"]

or something to that effect (where you count all roles occurrences for each unique wofo2 up front, and the final loop is really just copying over the counts for the roles you actually care about).

CodePudding user response：

You can use a Counter.

from collections import Counter

wofo = "Lichtfiguren\nnachhängt\ndiese\nSterne\nfallen".split("\n")
role =  "Metapher\nAnker\nAnker\nMetapher\nMetapher".split("\n")

dictionary = list(zip(wofo, role))
lemma_met = dict(Counter(z[0] for z in dictionary if z[1] == 'Metapher'))
lemma_ank = dict(Counter(z[0] for z in dictionary if z[1] == 'Anker'))

print(lemma_met)
print(lemma_ank)

Output:

{'Lichtfiguren': 1, 'Sterne': 1, 'fallen': 1}
{'nachhängt': 1, 'diese': 1}

Or you can skip the zip step:

wofo = "Lichtfiguren\nnachhängt\ndiese\nSterne\nfallen".split('\n')
role =  "Metapher\nAnker\nAnker\nMetapher\nMetapher".split('\n')

lemma_met = dict(Counter(word for i,word in enumerate(wofo) if role[i] == 'Metapher'))
lemma_ank = dict(Counter(word for i,word in enumerate(wofo) if role[i] == 'Anker'))

Note, all these methods require looping over the lists multiple times. A single for loop like in the original code is a bit more efficient. The advantage of these methods over a loop is that it will be easier to add more words to "role", like adverb. adjective, etc. Example:

results = {}
for r in set(role):
    results[r] = dict(Counter(word for i,word in enumerate(wofo) if role[i] == r))

Using a for loop.

lemma_met = {}
lemma_ank = {}
for i,word in enumerate(wofo):
    if role[i] == "Metapher":
        lemma_met[word] = lemma_met.get(word, 0)   1
    else:
        lemma_ank[word] = lemma_ank.get(word, 0)   1

*Note: all these examples assume the lists are the same size!