Home > database >  Trying to create a pandas df out of a frequency dict, with one col being the word and the next being
Trying to create a pandas df out of a frequency dict, with one col being the word and the next being

Time:03-01

I have

import pandas as pd
from nltk import FreqDist as fd

# frankenstein freqdist
frank_fd = fd('frank_lemma')
for word, count in frank_fd.items():
    data = {'Word':[word], 'Counts':[count]}
    
df = pd.DataFrame(data)
df.head()

but my printout gives me only one word with one count. I tried putting print(word, count) in the first line of the for loop and it is going over every word, just not adding them all to the df I tried to create. Anyone know why?

Edit: I checked out my data and it is only adding the very last word to the df

CodePudding user response:

for word, count in frank_fd.items():
    data = {'Word':[word], 'Counts':[count]}

Each iteration of this loop assigns a new value to data; the previous value is lost. So when you create the data frame after the loop only the last value assigned to data is added.

You'll need to define data before your loop and add each item to it instead:

data = {'Word':[], 'Counts':[]}    
for word, count in frank_fd.items():
    data['Word'].append(word)
    data['Counts'].append(count)

This one line will accomplish the same thing:

data = {'Word': frank_fd.keys(), 'Count': frank_fd.values()}

But the pandas DataFrame has a from_dict() method with an option that does what you want without the extra code:

df = pd.DataFrame.from_dict(frank_fd, orient='index', columns=['Counts'])

CodePudding user response:

You're trying to recreate a dict data structure very similar to the one you already have in the nltk.probability.FreqDist. Pandas is smart enough to let us add the FreqDist items to the DataFrame constructor.

This is working for me.

import pandas as pd
from nltk import FreqDist as fd

frank_fd = fd('frank_lemma')

df = pd.DataFrame(frank_fd.items(), columns=['Word', 'Counts'])

Output:

    Word    Counts
0   f       1
1   r       1
2   a       2
3   n       1
4   k       1
5   _       1
6   l       1
7   e       1
8   m       2
  • Related